Overview

Dataset statistics

Number of variables24
Number of observations45466
Missing cells105562
Missing cells (%)9.7%
Total size in memory79.6 MiB
Average record size in memory1.8 KiB

Variable types

Text18
Unsupported1
Numeric4
Boolean1

Alerts

video is highly imbalanced (97.9%)Imbalance
belongs_to_collection has 40972 (90.1%) missing valuesMissing
homepage has 37684 (82.9%) missing valuesMissing
overview has 954 (2.1%) missing valuesMissing
tagline has 25054 (55.1%) missing valuesMissing
popularity is an unsupported type, check if it needs cleaning or further analysisUnsupported
revenue has 38052 (83.7%) zerosZeros
runtime has 1558 (3.4%) zerosZeros
vote_average has 2998 (6.6%) zerosZeros
vote_count has 2899 (6.4%) zerosZeros

Reproduction

Analysis started2024-04-26 14:46:51.966536
Analysis finished2024-04-26 14:46:59.814800
Duration7.85 seconds
Software versionydata-profiling vv4.7.0
Download configurationconfig.json

Variables

adult
Text

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.7 MiB
2024-04-26T20:16:59.934802image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length126
Median length5
Mean length5.00508072
Min length4

Characters and Unicode

Total characters227561
Distinct characters34
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks2 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)< 0.1%

Sample

1st rowFalse
2nd rowFalse
3rd rowFalse
4th rowFalse
5th rowFalse
ValueCountFrequency (%)
false 45454
99.9%
true 9
 
< 0.1%
to 4
 
< 0.1%
a 4
 
< 0.1%
the 2
 
< 0.1%
avalanche 2
 
< 0.1%
by 2
 
< 0.1%
when 1
 
< 0.1%
contest 1
 
< 0.1%
hit 1
 
< 0.1%
Other values (32) 32
 
0.1%
2024-04-26T20:17:00.362800image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 45479
20.0%
a 45475
20.0%
s 45465
20.0%
l 45461
20.0%
F 45454
20.0%
49
 
< 0.1%
r 25
 
< 0.1%
t 23
 
< 0.1%
o 19
 
< 0.1%
n 17
 
< 0.1%
Other values (24) 94
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 182039
80.0%
Uppercase Letter 45470
 
20.0%
Space Separator 49
 
< 0.1%
Other Punctuation 2
 
< 0.1%
Dash Punctuation 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 45479
25.0%
a 45475
25.0%
s 45465
25.0%
l 45461
25.0%
r 25
 
< 0.1%
t 23
 
< 0.1%
o 19
 
< 0.1%
n 17
 
< 0.1%
i 13
 
< 0.1%
u 12
 
< 0.1%
Other values (12) 50
 
< 0.1%
Uppercase Letter
ValueCountFrequency (%)
F 45454
> 99.9%
T 9
 
< 0.1%
B 1
 
< 0.1%
R 1
 
< 0.1%
Ø 1
 
< 0.1%
O 1
 
< 0.1%
W 1
 
< 0.1%
A 1
 
< 0.1%
S 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
49
100.0%
Other Punctuation
ValueCountFrequency (%)
. 2
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 227509
> 99.9%
Common 52
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 45479
20.0%
a 45475
20.0%
s 45465
20.0%
l 45461
20.0%
F 45454
20.0%
r 25
 
< 0.1%
t 23
 
< 0.1%
o 19
 
< 0.1%
n 17
 
< 0.1%
i 13
 
< 0.1%
Other values (21) 78
 
< 0.1%
Common
ValueCountFrequency (%)
49
94.2%
. 2
 
3.8%
- 1
 
1.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 227559
> 99.9%
None 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 45479
20.0%
a 45475
20.0%
s 45465
20.0%
l 45461
20.0%
F 45454
20.0%
49
 
< 0.1%
r 25
 
< 0.1%
t 23
 
< 0.1%
o 19
 
< 0.1%
n 17
 
< 0.1%
Other values (22) 92
 
< 0.1%
None
ValueCountFrequency (%)
Ø 1
50.0%
å 1
50.0%

belongs_to_collection
Text

MISSING 

Distinct1698
Distinct (%)37.8%
Missing40972
Missing (%)90.1%
Memory size2.1 MiB
2024-04-26T20:17:00.542030image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length184
Median length167
Mean length141.4063195
Min length8

Characters and Unicode

Total characters635480
Distinct characters170
Distinct categories13 ?
Distinct scripts7 ?
Distinct blocks8 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique393 ?
Unique (%)8.7%

Sample

1st row{'id': 10194, 'name': 'Toy Story Collection', 'poster_path': '/7G9915LfUQ2lVfwMEEhDsn3kT4B.jpg', 'backdrop_path': '/9FBwqcd9IRruEDUrTdcaafOMKUq.jpg'}
2nd row{'id': 119050, 'name': 'Grumpy Old Men Collection', 'poster_path': '/nLvUdqgPgm3F85NMCii9gVFUcet.jpg', 'backdrop_path': '/hypTnLot2z8wpFS7qwsQHW1uV8u.jpg'}
3rd row{'id': 96871, 'name': 'Father of the Bride Collection', 'poster_path': '/nts4iOmNnq7GNicycMJ9pSAn204.jpg', 'backdrop_path': '/7qwE57OVZmMJChBpLEbJEmzUydk.jpg'}
4th row{'id': 645, 'name': 'James Bond Collection', 'poster_path': '/HORpg5CSkmeQlAolx3bKMrKgfi.jpg', 'backdrop_path': '/6VcVl48kNKvdXOZfJPdarlUGOsk.jpg'}
5th row{'id': 117693, 'name': 'Balto Collection', 'poster_path': '/w0ZgH6Lgxt2bQYnf1ss74UvYftm.jpg', 'backdrop_path': '/9VM5LiJV0bGb1st1KyHA3cVnO2G.jpg'}
ValueCountFrequency (%)
name 4497
 
9.7%
id 4491
 
9.7%
backdrop_path 4491
 
9.7%
poster_path 4491
 
9.7%
collection 3746
 
8.1%
none 1771
 
3.8%
the 1146
 
2.5%
of 230
 
0.5%
series 147
 
0.3%
139
 
0.3%
Other values (6634) 21083
45.6%
2024-04-26T20:17:00.868025image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
' 59225
 
9.3%
41739
 
6.6%
p 29081
 
4.6%
a 25710
 
4.0%
o 25040
 
3.9%
e 24229
 
3.8%
t 23203
 
3.7%
: 18063
 
2.8%
n 16731
 
2.6%
r 15825
 
2.5%
Other values (160) 356634
56.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 317110
49.9%
Other Punctuation 105770
 
16.6%
Uppercase Letter 95037
 
15.0%
Decimal Number 56977
 
9.0%
Space Separator 41739
 
6.6%
Connector Punctuation 8982
 
1.4%
Open Punctuation 4826
 
0.8%
Close Punctuation 4826
 
0.8%
Dash Punctuation 162
 
< 0.1%
Other Letter 37
 
< 0.1%
Other values (3) 14
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
p 29081
 
9.2%
a 25710
 
8.1%
o 25040
 
7.9%
e 24229
 
7.6%
t 23203
 
7.3%
n 16731
 
5.3%
r 15825
 
5.0%
i 15334
 
4.8%
h 14439
 
4.6%
d 13705
 
4.3%
Other values (69) 113813
35.9%
Uppercase Letter
ValueCountFrequency (%)
C 7696
 
8.1%
N 5094
 
5.4%
T 4597
 
4.8%
S 4189
 
4.4%
A 3722
 
3.9%
M 3699
 
3.9%
D 3683
 
3.9%
B 3680
 
3.9%
L 3482
 
3.7%
G 3461
 
3.6%
Other values (33) 51734
54.4%
Other Letter
ValueCountFrequency (%)
3
 
8.1%
3
 
8.1%
3
 
8.1%
3
 
8.1%
3
 
8.1%
3
 
8.1%
3
 
8.1%
3
 
8.1%
3
 
8.1%
2
 
5.4%
Other values (4) 8
21.6%
Other Punctuation
ValueCountFrequency (%)
' 59225
56.0%
: 18063
 
17.1%
, 13552
 
12.8%
. 7386
 
7.0%
/ 7232
 
6.8%
" 214
 
0.2%
& 52
 
< 0.1%
! 35
 
< 0.1%
* 4
 
< 0.1%
? 4
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
1 6794
11.9%
2 6109
10.7%
3 5875
10.3%
4 5783
10.1%
5 5706
10.0%
9 5483
9.6%
8 5454
9.6%
6 5372
9.4%
7 5352
9.4%
0 5049
8.9%
Open Punctuation
ValueCountFrequency (%)
{ 4491
93.1%
( 330
 
6.8%
[ 5
 
0.1%
Close Punctuation
ValueCountFrequency (%)
} 4491
93.1%
) 330
 
6.8%
] 5
 
0.1%
Dash Punctuation
ValueCountFrequency (%)
- 160
98.8%
2
 
1.2%
Space Separator
ValueCountFrequency (%)
41739
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 8982
100.0%
Final Punctuation
ValueCountFrequency (%)
9
100.0%
Modifier Letter
ValueCountFrequency (%)
3
100.0%
Other Number
ValueCountFrequency (%)
½ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 411733
64.8%
Common 223296
35.1%
Cyrillic 414
 
0.1%
Hiragana 15
 
< 0.1%
Hangul 10
 
< 0.1%
Katakana 9
 
< 0.1%
Han 3
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
p 29081
 
7.1%
a 25710
 
6.2%
o 25040
 
6.1%
e 24229
 
5.9%
t 23203
 
5.6%
n 16731
 
4.1%
r 15825
 
3.8%
i 15334
 
3.7%
h 14439
 
3.5%
d 13705
 
3.3%
Other values (70) 208436
50.6%
Cyrillic
ValueCountFrequency (%)
л 48
 
11.6%
и 41
 
9.9%
о 37
 
8.9%
к 30
 
7.2%
е 27
 
6.5%
я 25
 
6.0%
а 17
 
4.1%
К 16
 
3.9%
ц 16
 
3.9%
р 14
 
3.4%
Other values (32) 143
34.5%
Common
ValueCountFrequency (%)
' 59225
26.5%
41739
18.7%
: 18063
 
8.1%
, 13552
 
6.1%
_ 8982
 
4.0%
. 7386
 
3.3%
/ 7232
 
3.2%
1 6794
 
3.0%
2 6109
 
2.7%
3 5875
 
2.6%
Other values (24) 48339
21.6%
Hiragana
ValueCountFrequency (%)
3
20.0%
3
20.0%
3
20.0%
3
20.0%
3
20.0%
Hangul
ValueCountFrequency (%)
2
20.0%
2
20.0%
2
20.0%
2
20.0%
2
20.0%
Katakana
ValueCountFrequency (%)
3
33.3%
3
33.3%
3
33.3%
Han
ValueCountFrequency (%)
3
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 634766
99.9%
Cyrillic 414
 
0.1%
None 246
 
< 0.1%
Hiragana 15
 
< 0.1%
Punctuation 14
 
< 0.1%
Katakana 12
 
< 0.1%
Hangul 10
 
< 0.1%
CJK 3
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
' 59225
 
9.3%
41739
 
6.6%
p 29081
 
4.6%
a 25710
 
4.1%
o 25040
 
3.9%
e 24229
 
3.8%
t 23203
 
3.7%
: 18063
 
2.8%
n 16731
 
2.6%
r 15825
 
2.5%
Other values (71) 355920
56.1%
Cyrillic
ValueCountFrequency (%)
л 48
 
11.6%
и 41
 
9.9%
о 37
 
8.9%
к 30
 
7.2%
е 27
 
6.5%
я 25
 
6.0%
а 17
 
4.1%
К 16
 
3.9%
ц 16
 
3.9%
р 14
 
3.4%
Other values (32) 143
34.5%
None
ValueCountFrequency (%)
é 45
18.3%
ä 40
16.3%
ô 35
14.2%
ò 28
11.4%
ö 19
7.7%
ó 14
 
5.7%
ı 14
 
5.7%
í 9
 
3.7%
á 4
 
1.6%
İ 4
 
1.6%
Other values (19) 34
13.8%
Punctuation
ValueCountFrequency (%)
9
64.3%
3
 
21.4%
2
 
14.3%
Katakana
ValueCountFrequency (%)
3
25.0%
3
25.0%
3
25.0%
3
25.0%
Hiragana
ValueCountFrequency (%)
3
20.0%
3
20.0%
3
20.0%
3
20.0%
3
20.0%
CJK
ValueCountFrequency (%)
3
100.0%
Hangul
ValueCountFrequency (%)
2
20.0%
2
20.0%
2
20.0%
2
20.0%
2
20.0%

budget
Text

Distinct1226
Distinct (%)2.7%
Missing0
Missing (%)0.0%
Memory size2.6 MiB
2024-04-26T20:17:01.127055image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length32
Median length1
Mean length2.215391721
Min length1

Characters and Unicode

Total characters100725
Distinct characters49
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique839 ?
Unique (%)1.8%

Sample

1st row30000000
2nd row65000000
3rd row0
4th row16000000
5th row0
ValueCountFrequency (%)
0 36573
80.4%
5000000 286
 
0.6%
10000000 259
 
0.6%
20000000 243
 
0.5%
2000000 242
 
0.5%
15000000 226
 
0.5%
3000000 223
 
0.5%
25000000 206
 
0.5%
1000000 197
 
0.4%
30000000 190
 
0.4%
Other values (1216) 6821
 
15.0%
2024-04-26T20:17:01.531143image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 84525
83.9%
1 3222
 
3.2%
5 3201
 
3.2%
2 2555
 
2.5%
3 1792
 
1.8%
4 1325
 
1.3%
6 1147
 
1.1%
7 1119
 
1.1%
8 1102
 
1.1%
9 660
 
0.7%
Other values (39) 77
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 100648
99.9%
Lowercase Letter 46
 
< 0.1%
Uppercase Letter 25
 
< 0.1%
Other Punctuation 6
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
g 5
 
10.9%
j 4
 
8.7%
z 4
 
8.7%
p 4
 
8.7%
o 3
 
6.5%
b 3
 
6.5%
f 3
 
6.5%
w 2
 
4.3%
s 2
 
4.3%
q 2
 
4.3%
Other values (12) 14
30.4%
Uppercase Letter
ValueCountFrequency (%)
W 3
12.0%
G 3
12.0%
F 2
 
8.0%
X 2
 
8.0%
V 2
 
8.0%
S 2
 
8.0%
L 2
 
8.0%
D 2
 
8.0%
R 1
 
4.0%
H 1
 
4.0%
Other values (5) 5
20.0%
Decimal Number
ValueCountFrequency (%)
0 84525
84.0%
1 3222
 
3.2%
5 3201
 
3.2%
2 2555
 
2.5%
3 1792
 
1.8%
4 1325
 
1.3%
6 1147
 
1.1%
7 1119
 
1.1%
8 1102
 
1.1%
9 660
 
0.7%
Other Punctuation
ValueCountFrequency (%)
. 3
50.0%
/ 3
50.0%

Most occurring scripts

ValueCountFrequency (%)
Common 100654
99.9%
Latin 71
 
0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
g 5
 
7.0%
j 4
 
5.6%
z 4
 
5.6%
p 4
 
5.6%
o 3
 
4.2%
b 3
 
4.2%
W 3
 
4.2%
G 3
 
4.2%
f 3
 
4.2%
w 2
 
2.8%
Other values (27) 37
52.1%
Common
ValueCountFrequency (%)
0 84525
84.0%
1 3222
 
3.2%
5 3201
 
3.2%
2 2555
 
2.5%
3 1792
 
1.8%
4 1325
 
1.3%
6 1147
 
1.1%
7 1119
 
1.1%
8 1102
 
1.1%
9 660
 
0.7%
Other values (2) 6
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 100725
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 84525
83.9%
1 3222
 
3.2%
5 3201
 
3.2%
2 2555
 
2.5%
3 1792
 
1.8%
4 1325
 
1.3%
6 1147
 
1.1%
7 1119
 
1.1%
8 1102
 
1.1%
9 660
 
0.7%
Other values (39) 77
 
0.1%

genres
Text

Distinct4069
Distinct (%)8.9%
Missing0
Missing (%)0.0%
Memory size5.2 MiB
2024-04-26T20:17:01.747144image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length264
Median length225
Mean length62.82213082
Min length2

Characters and Unicode

Total characters2856271
Distinct characters56
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2365 ?
Unique (%)5.2%

Sample

1st row[{'id': 16, 'name': 'Animation'}, {'id': 35, 'name': 'Comedy'}, {'id': 10751, 'name': 'Family'}]
2nd row[{'id': 12, 'name': 'Adventure'}, {'id': 14, 'name': 'Fantasy'}, {'id': 10751, 'name': 'Family'}]
3rd row[{'id': 10749, 'name': 'Romance'}, {'id': 35, 'name': 'Comedy'}]
4th row[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'name': 'Drama'}, {'id': 10749, 'name': 'Romance'}]
5th row[{'id': 35, 'name': 'Comedy'}]
ValueCountFrequency (%)
id 91106
24.6%
name 91106
24.6%
drama 20265
 
5.5%
18 20265
 
5.5%
35 13182
 
3.6%
comedy 13182
 
3.6%
53 7624
 
2.1%
thriller 7624
 
2.1%
romance 6735
 
1.8%
10749 6735
 
1.8%
Other values (71) 92873
25.1%
2024-04-26T20:17:02.098144image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
' 546636
19.1%
325231
 
11.4%
: 182212
 
6.4%
a 152966
 
5.4%
e 146936
 
5.1%
m 144238
 
5.0%
, 139188
 
4.9%
i 130819
 
4.6%
n 126822
 
4.4%
d 107792
 
3.8%
Other values (46) 853431
29.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1059477
37.1%
Other Punctuation 868036
30.4%
Space Separator 325231
 
11.4%
Decimal Number 234672
 
8.2%
Close Punctuation 136572
 
4.8%
Open Punctuation 136572
 
4.8%
Uppercase Letter 95711
 
3.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 152966
14.4%
e 146936
13.9%
m 144238
13.6%
i 130819
12.3%
n 126822
12.0%
d 107792
10.2%
r 69131
6.5%
o 48578
 
4.6%
y 28531
 
2.7%
c 28015
 
2.6%
Other values (12) 75649
7.1%
Uppercase Letter
ValueCountFrequency (%)
D 24197
25.3%
C 17492
18.3%
A 12029
12.6%
F 9756
10.2%
T 8395
 
8.8%
R 6737
 
7.0%
H 6072
 
6.3%
M 4834
 
5.1%
S 3053
 
3.2%
W 2365
 
2.5%
Other values (6) 781
 
0.8%
Decimal Number
ValueCountFrequency (%)
1 45609
19.4%
8 39739
16.9%
5 24901
10.6%
3 23251
9.9%
7 22757
9.7%
0 21491
9.2%
9 18690
8.0%
2 17694
 
7.5%
4 13113
 
5.6%
6 7427
 
3.2%
Other Punctuation
ValueCountFrequency (%)
' 546636
63.0%
: 182212
 
21.0%
, 139188
 
16.0%
Close Punctuation
ValueCountFrequency (%)
} 91106
66.7%
] 45466
33.3%
Open Punctuation
ValueCountFrequency (%)
{ 91106
66.7%
[ 45466
33.3%
Space Separator
ValueCountFrequency (%)
325231
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1701083
59.6%
Latin 1155188
40.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 152966
13.2%
e 146936
12.7%
m 144238
12.5%
i 130819
11.3%
n 126822
11.0%
d 107792
9.3%
r 69131
6.0%
o 48578
 
4.2%
y 28531
 
2.5%
c 28015
 
2.4%
Other values (28) 171360
14.8%
Common
ValueCountFrequency (%)
' 546636
32.1%
325231
19.1%
: 182212
 
10.7%
, 139188
 
8.2%
} 91106
 
5.4%
{ 91106
 
5.4%
1 45609
 
2.7%
] 45466
 
2.7%
[ 45466
 
2.7%
8 39739
 
2.3%
Other values (8) 149324
 
8.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2856271
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
' 546636
19.1%
325231
 
11.4%
: 182212
 
6.4%
a 152966
 
5.4%
e 146936
 
5.1%
m 144238
 
5.0%
, 139188
 
4.9%
i 130819
 
4.6%
n 126822
 
4.4%
d 107792
 
3.8%
Other values (46) 853431
29.9%

homepage
Text

MISSING 

Distinct7673
Distinct (%)98.6%
Missing37684
Missing (%)82.9%
Memory size1.8 MiB
2024-04-26T20:17:02.312144image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length242
Median length110
Mean length36.71279877
Min length13

Characters and Unicode

Total characters285699
Distinct characters91
Distinct categories12 ?
Distinct scripts3 ?
Distinct blocks4 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7610 ?
Unique (%)97.8%

Sample

1st rowhttp://toystory.disney.com/toy-story
2nd rowhttp://www.mgm.com/view/movie/757/Goldeneye/
3rd rowhttp://www.mgm.com/title_title.do?title_star=LEAVINGL
4th rowhttp://www.sevenmovie.com/
5th rowhttp://www.mgm.com/#/our-titles/2083/The-Usual-Suspects
ValueCountFrequency (%)
http://www.georgecarlin.com 12
 
0.2%
iso_3166_1 7
 
0.1%
name 7
 
0.1%
http://www.wernerherzog.com/films-by.html 7
 
0.1%
http://www.kungfupanda.com 6
 
0.1%
http://breakblade.jp 6
 
0.1%
http://www.missionimpossible.com 5
 
0.1%
http://www.transformersmovie.com 5
 
0.1%
http://www.thehungergames.movie 4
 
0.1%
http://www.crownintlpictures.com/ostitles.html 4
 
0.1%
Other values (7658) 7753
99.2%
2024-04-26T20:17:02.668663image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
t 25849
 
9.0%
/ 25820
 
9.0%
w 19516
 
6.8%
o 18783
 
6.6%
e 18709
 
6.5%
. 15387
 
5.4%
m 15101
 
5.3%
h 13863
 
4.9%
i 13654
 
4.8%
c 11414
 
4.0%
Other values (81) 107603
37.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 225336
78.9%
Other Punctuation 49566
 
17.3%
Decimal Number 4726
 
1.7%
Dash Punctuation 3507
 
1.2%
Uppercase Letter 1721
 
0.6%
Connector Punctuation 471
 
0.2%
Math Symbol 287
 
0.1%
Space Separator 34
 
< 0.1%
Open Punctuation 24
 
< 0.1%
Close Punctuation 24
 
< 0.1%
Other values (2) 3
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
t 25849
 
11.5%
w 19516
 
8.7%
o 18783
 
8.3%
e 18709
 
8.3%
m 15101
 
6.7%
h 13863
 
6.2%
i 13654
 
6.1%
c 11414
 
5.1%
p 11166
 
5.0%
a 11155
 
5.0%
Other values (18) 66126
29.3%
Uppercase Letter
ValueCountFrequency (%)
M 145
 
8.4%
T 138
 
8.0%
S 122
 
7.1%
A 106
 
6.2%
F 101
 
5.9%
B 96
 
5.6%
E 91
 
5.3%
D 90
 
5.2%
I 90
 
5.2%
C 88
 
5.1%
Other values (16) 654
38.0%
Other Punctuation
ValueCountFrequency (%)
/ 25820
52.1%
. 15387
31.0%
: 7806
 
15.7%
? 189
 
0.4%
% 105
 
0.2%
# 85
 
0.2%
& 79
 
0.2%
' 60
 
0.1%
, 17
 
< 0.1%
! 14
 
< 0.1%
Other values (2) 4
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
0 853
18.0%
2 728
15.4%
1 727
15.4%
3 478
10.1%
9 349
7.4%
6 341
 
7.2%
4 331
 
7.0%
5 314
 
6.6%
8 311
 
6.6%
7 294
 
6.2%
Math Symbol
ValueCountFrequency (%)
= 271
94.4%
+ 14
 
4.9%
~ 2
 
0.7%
Open Punctuation
ValueCountFrequency (%)
( 13
54.2%
{ 8
33.3%
[ 3
 
12.5%
Close Punctuation
ValueCountFrequency (%)
) 13
54.2%
} 8
33.3%
] 3
 
12.5%
Other Letter
ValueCountFrequency (%)
1
50.0%
1
50.0%
Dash Punctuation
ValueCountFrequency (%)
- 3507
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 471
100.0%
Space Separator
ValueCountFrequency (%)
34
100.0%
Format
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 227057
79.5%
Common 58640
 
20.5%
Hangul 2
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
t 25849
 
11.4%
w 19516
 
8.6%
o 18783
 
8.3%
e 18709
 
8.2%
m 15101
 
6.7%
h 13863
 
6.1%
i 13654
 
6.0%
c 11414
 
5.0%
p 11166
 
4.9%
a 11155
 
4.9%
Other values (44) 67847
29.9%
Common
ValueCountFrequency (%)
/ 25820
44.0%
. 15387
26.2%
: 7806
 
13.3%
- 3507
 
6.0%
0 853
 
1.5%
2 728
 
1.2%
1 727
 
1.2%
3 478
 
0.8%
_ 471
 
0.8%
9 349
 
0.6%
Other values (25) 2514
 
4.3%
Hangul
ValueCountFrequency (%)
1
50.0%
1
50.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 285693
> 99.9%
Hangul 2
 
< 0.1%
None 2
 
< 0.1%
Punctuation 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 25849
 
9.0%
/ 25820
 
9.0%
w 19516
 
6.8%
o 18783
 
6.6%
e 18709
 
6.5%
. 15387
 
5.4%
m 15101
 
5.3%
h 13863
 
4.9%
i 13654
 
4.8%
c 11414
 
4.0%
Other values (75) 107597
37.7%
Hangul
ValueCountFrequency (%)
1
50.0%
1
50.0%
None
ValueCountFrequency (%)
ñ 1
50.0%
ä 1
50.0%
Punctuation
ValueCountFrequency (%)
1
50.0%
1
50.0%

id
Text

Distinct45436
Distinct (%)99.9%
Missing0
Missing (%)0.0%
Memory size2.7 MiB
2024-04-26T20:17:03.057663image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length10
Median length5
Mean length5.251484626
Min length1

Characters and Unicode

Total characters238764
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique45407 ?
Unique (%)99.9%

Sample

1st row862
2nd row8844
3rd row15602
4th row31357
5th row11862
ValueCountFrequency (%)
141971 3
 
< 0.1%
159849 2
 
< 0.1%
168538 2
 
< 0.1%
298721 2
 
< 0.1%
265189 2
 
< 0.1%
5511 2
 
< 0.1%
97995 2
 
< 0.1%
99080 2
 
< 0.1%
23305 2
 
< 0.1%
119916 2
 
< 0.1%
Other values (45426) 45445
> 99.9%
2024-04-26T20:17:03.559938image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1 32923
13.8%
2 28625
12.0%
3 26732
11.2%
4 24747
10.4%
5 21996
9.2%
6 21184
8.9%
7 20949
8.8%
8 20909
8.8%
9 20485
8.6%
0 20208
8.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 238758
> 99.9%
Dash Punctuation 6
 
< 0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 32923
13.8%
2 28625
12.0%
3 26732
11.2%
4 24747
10.4%
5 21996
9.2%
6 21184
8.9%
7 20949
8.8%
8 20909
8.8%
9 20485
8.6%
0 20208
8.5%
Dash Punctuation
ValueCountFrequency (%)
- 6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 238764
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 32923
13.8%
2 28625
12.0%
3 26732
11.2%
4 24747
10.4%
5 21996
9.2%
6 21184
8.9%
7 20949
8.8%
8 20909
8.8%
9 20485
8.6%
0 20208
8.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 238764
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 32923
13.8%
2 28625
12.0%
3 26732
11.2%
4 24747
10.4%
5 21996
9.2%
6 21184
8.9%
7 20949
8.8%
8 20909
8.8%
9 20485
8.6%
0 20208
8.5%
Distinct45417
Distinct (%)99.9%
Missing17
Missing (%)< 0.1%
Memory size2.9 MiB
2024-04-26T20:17:03.904938image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length9
Median length9
Mean length8.999471936
Min length1

Characters and Unicode

Total characters409017
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique45387 ?
Unique (%)99.9%

Sample

1st rowtt0114709
2nd rowtt0113497
3rd rowtt0113228
4th rowtt0114885
5th rowtt0113041
ValueCountFrequency (%)
tt1180333 3
 
< 0.1%
0 3
 
< 0.1%
tt0046468 2
 
< 0.1%
tt1327820 2
 
< 0.1%
tt2818654 2
 
< 0.1%
tt0111613 2
 
< 0.1%
tt1821641 2
 
< 0.1%
tt0127834 2
 
< 0.1%
tt0295682 2
 
< 0.1%
tt0080000 2
 
< 0.1%
Other values (45407) 45427
> 99.9%
2024-04-26T20:17:04.365935image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
t 90892
22.2%
0 69913
17.1%
1 37232
9.1%
2 31234
 
7.6%
4 28498
 
7.0%
3 28135
 
6.9%
8 25445
 
6.2%
6 25442
 
6.2%
5 24253
 
5.9%
7 24221
 
5.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 318125
77.8%
Lowercase Letter 90892
 
22.2%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 69913
22.0%
1 37232
11.7%
2 31234
9.8%
4 28498
9.0%
3 28135
8.8%
8 25445
 
8.0%
6 25442
 
8.0%
5 24253
 
7.6%
7 24221
 
7.6%
9 23752
 
7.5%
Lowercase Letter
ValueCountFrequency (%)
t 90892
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 318125
77.8%
Latin 90892
 
22.2%

Most frequent character per script

Common
ValueCountFrequency (%)
0 69913
22.0%
1 37232
11.7%
2 31234
9.8%
4 28498
9.0%
3 28135
8.8%
8 25445
 
8.0%
6 25442
 
8.0%
5 24253
 
7.6%
7 24221
 
7.6%
9 23752
 
7.5%
Latin
ValueCountFrequency (%)
t 90892
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 409017
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
t 90892
22.2%
0 69913
17.1%
1 37232
9.1%
2 31234
 
7.6%
4 28498
 
7.0%
3 28135
 
6.9%
8 25445
 
6.2%
6 25442
 
6.2%
5 24253
 
5.9%
7 24221
 
5.9%
Distinct92
Distinct (%)0.2%
Missing11
Missing (%)< 0.1%
Memory size2.6 MiB
2024-04-26T20:17:04.525779image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length5
Median length2
Mean length2.000153998
Min length2

Characters and Unicode

Total characters90917
Distinct characters33
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique20 ?
Unique (%)< 0.1%

Sample

1st rowen
2nd rowen
3rd rowen
4th rowen
5th rowen
ValueCountFrequency (%)
en 32269
71.0%
fr 2438
 
5.4%
it 1529
 
3.4%
ja 1350
 
3.0%
de 1080
 
2.4%
es 994
 
2.2%
ru 826
 
1.8%
hi 508
 
1.1%
ko 444
 
1.0%
zh 409
 
0.9%
Other values (82) 3608
 
7.9%
2024-04-26T20:17:04.780289image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 34598
38.1%
n 32978
36.3%
r 3636
 
4.0%
f 2839
 
3.1%
i 2391
 
2.6%
t 2252
 
2.5%
a 1841
 
2.0%
s 1654
 
1.8%
j 1351
 
1.5%
d 1325
 
1.5%
Other values (23) 6052
 
6.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 90904
> 99.9%
Decimal Number 10
 
< 0.1%
Other Punctuation 3
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 34598
38.1%
n 32978
36.3%
r 3636
 
4.0%
f 2839
 
3.1%
i 2391
 
2.6%
t 2252
 
2.5%
a 1841
 
2.0%
s 1654
 
1.8%
j 1351
 
1.5%
d 1325
 
1.5%
Other values (16) 6039
 
6.6%
Decimal Number
ValueCountFrequency (%)
0 4
40.0%
8 2
20.0%
2 1
 
10.0%
6 1
 
10.0%
1 1
 
10.0%
4 1
 
10.0%
Other Punctuation
ValueCountFrequency (%)
. 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 90904
> 99.9%
Common 13
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 34598
38.1%
n 32978
36.3%
r 3636
 
4.0%
f 2839
 
3.1%
i 2391
 
2.6%
t 2252
 
2.5%
a 1841
 
2.0%
s 1654
 
1.8%
j 1351
 
1.5%
d 1325
 
1.5%
Other values (16) 6039
 
6.6%
Common
ValueCountFrequency (%)
0 4
30.8%
. 3
23.1%
8 2
15.4%
2 1
 
7.7%
6 1
 
7.7%
1 1
 
7.7%
4 1
 
7.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90917
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 34598
38.1%
n 32978
36.3%
r 3636
 
4.0%
f 2839
 
3.1%
i 2391
 
2.6%
t 2252
 
2.5%
a 1841
 
2.0%
s 1654
 
1.8%
j 1351
 
1.5%
d 1325
 
1.5%
Other values (23) 6052
 
6.7%
Distinct43373
Distinct (%)95.4%
Missing0
Missing (%)0.0%
Memory size3.4 MiB
2024-04-26T20:17:05.108291image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length109
Median length84
Mean length16.32349448
Min length1

Characters and Unicode

Total characters742164
Distinct characters2946
Distinct categories21 ?
Distinct scripts21 ?
Distinct blocks29 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique41712 ?
Unique (%)91.7%

Sample

1st rowToy Story
2nd rowJumanji
3rd rowGrumpier Old Men
4th rowWaiting to Exhale
5th rowFather of the Bride Part II
ValueCountFrequency (%)
the 10261
 
7.8%
of 3309
 
2.5%
a 1674
 
1.3%
in 1275
 
1.0%
and 1072
 
0.8%
la 1007
 
0.8%
863
 
0.7%
to 806
 
0.6%
de 702
 
0.5%
man 509
 
0.4%
Other values (35324) 110301
83.7%
2024-04-26T20:17:05.609840image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
86293
 
11.6%
e 70665
 
9.5%
a 49100
 
6.6%
o 42066
 
5.7%
i 39494
 
5.3%
n 39149
 
5.3%
r 37728
 
5.1%
t 33530
 
4.5%
s 28615
 
3.9%
l 25557
 
3.4%
Other values (2936) 289967
39.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 521249
70.2%
Uppercase Letter 102496
 
13.8%
Space Separator 86338
 
11.6%
Other Letter 14792
 
2.0%
Other Punctuation 10434
 
1.4%
Decimal Number 3862
 
0.5%
Dash Punctuation 1207
 
0.2%
Nonspacing Mark 579
 
0.1%
Spacing Mark 480
 
0.1%
Modifier Letter 249
 
< 0.1%
Other values (11) 478
 
0.1%

Most frequent character per category

Other Letter
ValueCountFrequency (%)
341
 
2.3%
183
 
1.2%
141
 
1.0%
ا 117
 
0.8%
116
 
0.8%
112
 
0.8%
112
 
0.8%
84
 
0.6%
84
 
0.6%
76
 
0.5%
Other values (2400) 13426
90.8%
Lowercase Letter
ValueCountFrequency (%)
e 70665
13.6%
a 49100
 
9.4%
o 42066
 
8.1%
i 39494
 
7.6%
n 39149
 
7.5%
r 37728
 
7.2%
t 33530
 
6.4%
s 28615
 
5.5%
l 25557
 
4.9%
h 22886
 
4.4%
Other values (200) 132459
25.4%
Uppercase Letter
ValueCountFrequency (%)
T 12191
 
11.9%
S 8839
 
8.6%
M 6898
 
6.7%
B 6470
 
6.3%
L 6176
 
6.0%
C 6079
 
5.9%
A 5849
 
5.7%
D 5693
 
5.6%
H 4443
 
4.3%
P 4434
 
4.3%
Other values (121) 35424
34.6%
Nonspacing Mark
ValueCountFrequency (%)
90
15.5%
58
 
10.0%
42
 
7.3%
30
 
5.2%
28
 
4.8%
25
 
4.3%
25
 
4.3%
21
 
3.6%
21
 
3.6%
19
 
3.3%
Other values (34) 220
38.0%
Spacing Mark
ValueCountFrequency (%)
87
18.1%
37
 
7.7%
37
 
7.7%
36
 
7.5%
33
 
6.9%
ி 33
 
6.9%
ि 20
 
4.2%
19
 
4.0%
17
 
3.5%
16
 
3.3%
Other values (25) 145
30.2%
Other Punctuation
ValueCountFrequency (%)
: 3331
31.9%
' 2614
25.1%
. 1690
16.2%
, 1133
 
10.9%
! 687
 
6.6%
& 403
 
3.9%
? 253
 
2.4%
/ 84
 
0.8%
75
 
0.7%
* 20
 
0.2%
Other values (24) 144
 
1.4%
Decimal Number
ValueCountFrequency (%)
2 841
21.8%
1 688
17.8%
0 589
15.3%
3 480
12.4%
9 238
 
6.2%
4 232
 
6.0%
5 219
 
5.7%
7 210
 
5.4%
8 163
 
4.2%
6 163
 
4.2%
Other values (14) 39
 
1.0%
Math Symbol
ValueCountFrequency (%)
+ 19
34.5%
16
29.1%
× 7
 
12.7%
= 3
 
5.5%
~ 3
 
5.5%
< 2
 
3.6%
> 2
 
3.6%
1
 
1.8%
1
 
1.8%
1
 
1.8%
Dash Punctuation
ValueCountFrequency (%)
- 1165
96.5%
32
 
2.7%
4
 
0.3%
3
 
0.2%
2
 
0.2%
1
 
0.1%
Close Punctuation
ValueCountFrequency (%)
) 98
71.0%
] 13
 
9.4%
13
 
9.4%
8
 
5.8%
3
 
2.2%
} 3
 
2.2%
Open Punctuation
ValueCountFrequency (%)
( 96
70.6%
13
 
9.6%
[ 13
 
9.6%
8
 
5.9%
3
 
2.2%
{ 3
 
2.2%
Other Symbol
ValueCountFrequency (%)
10
47.6%
° 7
33.3%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
Other Number
ValueCountFrequency (%)
½ 10
66.7%
² 2
 
13.3%
³ 2
 
13.3%
1
 
6.7%
Final Punctuation
ValueCountFrequency (%)
32
84.2%
» 5
 
13.2%
1
 
2.6%
Currency Symbol
ValueCountFrequency (%)
$ 18
85.7%
¢ 2
 
9.5%
£ 1
 
4.8%
Format
ValueCountFrequency (%)
15
44.1%
14
41.2%
5
 
14.7%
Initial Punctuation
ValueCountFrequency (%)
« 5
71.4%
1
 
14.3%
1
 
14.3%
Letter Number
ValueCountFrequency (%)
2
50.0%
1
25.0%
1
25.0%
Space Separator
ValueCountFrequency (%)
86293
99.9%
  45
 
0.1%
Modifier Letter
ValueCountFrequency (%)
245
98.4%
4
 
1.6%
Connector Punctuation
ValueCountFrequency (%)
_ 9
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 612843
82.6%
Common 102531
 
13.8%
Cyrillic 9023
 
1.2%
Han 5713
 
0.8%
Katakana 2543
 
0.3%
Hangul 2012
 
0.3%
Greek 1747
 
0.2%
Hiragana 1593
 
0.2%
Arabic 909
 
0.1%
Devanagari 831
 
0.1%
Other values (11) 2419
 
0.3%

Most frequent character per script

Han
ValueCountFrequency (%)
65
 
1.1%
64
 
1.1%
57
 
1.0%
56
 
1.0%
53
 
0.9%
51
 
0.9%
48
 
0.8%
47
 
0.8%
42
 
0.7%
39
 
0.7%
Other values (1491) 5191
90.9%
Hangul
ValueCountFrequency (%)
64
 
3.2%
52
 
2.6%
40
 
2.0%
30
 
1.5%
29
 
1.4%
29
 
1.4%
28
 
1.4%
26
 
1.3%
26
 
1.3%
23
 
1.1%
Other values (471) 1665
82.8%
Latin
ValueCountFrequency (%)
e 70665
 
11.5%
a 49100
 
8.0%
o 42066
 
6.9%
i 39494
 
6.4%
n 39149
 
6.4%
r 37728
 
6.2%
t 33530
 
5.5%
s 28615
 
4.7%
l 25557
 
4.2%
h 22886
 
3.7%
Other values (176) 224053
36.6%
Common
ValueCountFrequency (%)
86293
84.2%
: 3331
 
3.2%
' 2614
 
2.5%
. 1690
 
1.6%
- 1165
 
1.1%
, 1133
 
1.1%
2 841
 
0.8%
1 688
 
0.7%
! 687
 
0.7%
0 589
 
0.6%
Other values (94) 3500
 
3.4%
Katakana
ValueCountFrequency (%)
183
 
7.2%
141
 
5.5%
116
 
4.6%
112
 
4.4%
112
 
4.4%
84
 
3.3%
76
 
3.0%
75
 
2.9%
72
 
2.8%
71
 
2.8%
Other values (70) 1501
59.0%
Hiragana
ValueCountFrequency (%)
341
21.4%
84
 
5.3%
54
 
3.4%
50
 
3.1%
50
 
3.1%
49
 
3.1%
42
 
2.6%
39
 
2.4%
39
 
2.4%
36
 
2.3%
Other values (65) 809
50.8%
Cyrillic
ValueCountFrequency (%)
о 837
 
9.3%
а 771
 
8.5%
е 700
 
7.8%
и 656
 
7.3%
н 573
 
6.4%
р 505
 
5.6%
л 393
 
4.4%
т 372
 
4.1%
к 335
 
3.7%
с 323
 
3.6%
Other values (58) 3558
39.4%
Greek
ValueCountFrequency (%)
α 172
 
9.8%
ο 132
 
7.6%
ι 110
 
6.3%
τ 103
 
5.9%
ρ 85
 
4.9%
ν 71
 
4.1%
λ 69
 
3.9%
ς 68
 
3.9%
ε 63
 
3.6%
η 62
 
3.5%
Other values (49) 812
46.5%
Devanagari
ValueCountFrequency (%)
87
 
10.5%
61
 
7.3%
44
 
5.3%
42
 
5.1%
38
 
4.6%
38
 
4.6%
37
 
4.5%
33
 
4.0%
32
 
3.9%
30
 
3.6%
Other values (49) 389
46.8%
Thai
ValueCountFrequency (%)
48
 
6.7%
46
 
6.4%
46
 
6.4%
44
 
6.1%
31
 
4.3%
28
 
3.9%
27
 
3.7%
27
 
3.7%
25
 
3.5%
21
 
2.9%
Other values (46) 378
52.4%
Malayalam
ValueCountFrequency (%)
58
 
17.5%
19
 
5.7%
18
 
5.4%
17
 
5.1%
16
 
4.8%
ി 14
 
4.2%
12
 
3.6%
12
 
3.6%
11
 
3.3%
10
 
3.0%
Other values (36) 145
43.7%
Arabic
ValueCountFrequency (%)
ا 117
 
12.9%
ر 71
 
7.8%
ی 68
 
7.5%
ن 66
 
7.3%
د 56
 
6.2%
و 53
 
5.8%
ل 51
 
5.6%
ه 42
 
4.6%
ب 39
 
4.3%
م 35
 
3.9%
Other values (33) 311
34.2%
Bengali
ValueCountFrequency (%)
33
 
12.5%
25
 
9.5%
19
 
7.2%
15
 
5.7%
14
 
5.3%
13
 
4.9%
12
 
4.6%
ি 11
 
4.2%
10
 
3.8%
9
 
3.4%
Other values (32) 102
38.8%
Tamil
ValueCountFrequency (%)
90
16.1%
42
 
7.5%
37
 
6.6%
36
 
6.4%
ி 33
 
5.9%
30
 
5.4%
25
 
4.5%
25
 
4.5%
23
 
4.1%
23
 
4.1%
Other values (31) 196
35.0%
Telugu
ValueCountFrequency (%)
25
 
11.5%
16
 
7.3%
16
 
7.3%
ి 13
 
6.0%
13
 
6.0%
12
 
5.5%
11
 
5.0%
11
 
5.0%
9
 
4.1%
9
 
4.1%
Other values (23) 83
38.1%
Georgian
ValueCountFrequency (%)
25
19.2%
20
15.4%
9
 
6.9%
8
 
6.2%
7
 
5.4%
7
 
5.4%
7
 
5.4%
6
 
4.6%
6
 
4.6%
6
 
4.6%
Other values (15) 29
22.3%
Hebrew
ValueCountFrequency (%)
י 24
15.9%
ו 22
14.6%
א 11
 
7.3%
ל 9
 
6.0%
ר 9
 
6.0%
ב 9
 
6.0%
ת 9
 
6.0%
ש 7
 
4.6%
מ 7
 
4.6%
ח 6
 
4.0%
Other values (14) 38
25.2%
Armenian
ValueCountFrequency (%)
Տ 1
14.3%
վ 1
14.3%
ի 1
14.3%
թ 1
14.3%
ե 1
14.3%
ր 1
14.3%
ա 1
14.3%
Lao
ValueCountFrequency (%)
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
Kannada
ValueCountFrequency (%)
1
16.7%
1
16.7%
ಿ 1
16.7%
1
16.7%
1
16.7%
1
16.7%
Inherited
ValueCountFrequency (%)
14
58.3%
́ 5
 
20.8%
5
 
20.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 710716
95.8%
Cyrillic 9023
 
1.2%
None 5949
 
0.8%
CJK 5706
 
0.8%
Katakana 2863
 
0.4%
Hangul 2012
 
0.3%
Hiragana 1593
 
0.2%
Arabic 913
 
0.1%
Devanagari 831
 
0.1%
Thai 721
 
0.1%
Other values (19) 1837
 
0.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
86293
 
12.1%
e 70665
 
9.9%
a 49100
 
6.9%
o 42066
 
5.9%
i 39494
 
5.6%
n 39149
 
5.5%
r 37728
 
5.3%
t 33530
 
4.7%
s 28615
 
4.0%
l 25557
 
3.6%
Other values (81) 258519
36.4%
Cyrillic
ValueCountFrequency (%)
о 837
 
9.3%
а 771
 
8.5%
е 700
 
7.8%
и 656
 
7.3%
н 573
 
6.4%
р 505
 
5.6%
л 393
 
4.4%
т 372
 
4.1%
к 335
 
3.7%
с 323
 
3.6%
Other values (58) 3558
39.4%
None
ValueCountFrequency (%)
é 719
 
12.1%
ä 443
 
7.4%
è 232
 
3.9%
ö 200
 
3.4%
á 181
 
3.0%
α 172
 
2.9%
í 165
 
2.8%
ó 161
 
2.7%
à 146
 
2.5%
ü 143
 
2.4%
Other values (211) 3387
56.9%
Hiragana
ValueCountFrequency (%)
341
21.4%
84
 
5.3%
54
 
3.4%
50
 
3.1%
50
 
3.1%
49
 
3.1%
42
 
2.6%
39
 
2.4%
39
 
2.4%
36
 
2.3%
Other values (65) 809
50.8%
Katakana
ValueCountFrequency (%)
245
 
8.6%
183
 
6.4%
141
 
4.9%
116
 
4.1%
112
 
3.9%
112
 
3.9%
84
 
2.9%
76
 
2.7%
75
 
2.6%
75
 
2.6%
Other values (72) 1644
57.4%
Arabic
ValueCountFrequency (%)
ا 117
 
12.8%
ر 71
 
7.8%
ی 68
 
7.4%
ن 66
 
7.2%
د 56
 
6.1%
و 53
 
5.8%
ل 51
 
5.6%
ه 42
 
4.6%
ب 39
 
4.3%
م 35
 
3.8%
Other values (35) 315
34.5%
Tamil
ValueCountFrequency (%)
90
16.1%
42
 
7.5%
37
 
6.6%
36
 
6.4%
ி 33
 
5.9%
30
 
5.4%
25
 
4.5%
25
 
4.5%
23
 
4.1%
23
 
4.1%
Other values (31) 196
35.0%
Devanagari
ValueCountFrequency (%)
87
 
10.5%
61
 
7.3%
44
 
5.3%
42
 
5.1%
38
 
4.6%
38
 
4.6%
37
 
4.5%
33
 
4.0%
32
 
3.9%
30
 
3.6%
Other values (49) 389
46.8%
CJK
ValueCountFrequency (%)
65
 
1.1%
64
 
1.1%
57
 
1.0%
56
 
1.0%
53
 
0.9%
51
 
0.9%
48
 
0.8%
47
 
0.8%
42
 
0.7%
39
 
0.7%
Other values (1487) 5184
90.9%
Hangul
ValueCountFrequency (%)
64
 
3.2%
52
 
2.6%
40
 
2.0%
30
 
1.5%
29
 
1.4%
29
 
1.4%
28
 
1.4%
26
 
1.3%
26
 
1.3%
23
 
1.1%
Other values (471) 1665
82.8%
Malayalam
ValueCountFrequency (%)
58
 
17.5%
19
 
5.7%
18
 
5.4%
17
 
5.1%
16
 
4.8%
ി 14
 
4.2%
12
 
3.6%
12
 
3.6%
11
 
3.3%
10
 
3.0%
Other values (36) 145
43.7%
Thai
ValueCountFrequency (%)
48
 
6.7%
46
 
6.4%
46
 
6.4%
44
 
6.1%
31
 
4.3%
28
 
3.9%
27
 
3.7%
27
 
3.7%
25
 
3.5%
21
 
2.9%
Other values (46) 378
52.4%
Bengali
ValueCountFrequency (%)
33
 
12.5%
25
 
9.5%
19
 
7.2%
15
 
5.7%
14
 
5.3%
13
 
4.9%
12
 
4.6%
ি 11
 
4.2%
10
 
3.8%
9
 
3.4%
Other values (32) 102
38.8%
Punctuation
ValueCountFrequency (%)
32
26.7%
32
26.7%
15
12.5%
14
11.7%
11
 
9.2%
5
 
4.2%
3
 
2.5%
2
 
1.7%
2
 
1.7%
1
 
0.8%
Other values (3) 3
 
2.5%
Telugu
ValueCountFrequency (%)
25
 
11.5%
16
 
7.3%
16
 
7.3%
ి 13
 
6.0%
13
 
6.0%
12
 
5.5%
11
 
5.0%
11
 
5.0%
9
 
4.1%
9
 
4.1%
Other values (23) 83
38.1%
Georgian
ValueCountFrequency (%)
25
19.2%
20
15.4%
9
 
6.9%
8
 
6.2%
7
 
5.4%
7
 
5.4%
7
 
5.4%
6
 
4.6%
6
 
4.6%
6
 
4.6%
Other values (15) 29
22.3%
Hebrew
ValueCountFrequency (%)
י 24
15.9%
ו 22
14.6%
א 11
 
7.3%
ל 9
 
6.0%
ר 9
 
6.0%
ב 9
 
6.0%
ת 9
 
6.0%
ש 7
 
4.6%
מ 7
 
4.6%
ח 6
 
4.0%
Other values (14) 38
25.2%
Misc Symbols
ValueCountFrequency (%)
10
83.3%
1
 
8.3%
1
 
8.3%
Diacriticals
ValueCountFrequency (%)
́ 5
100.0%
Number Forms
ValueCountFrequency (%)
2
40.0%
1
20.0%
1
20.0%
1
20.0%
Latin Ext Additional
ValueCountFrequency (%)
2
14.3%
2
14.3%
1
7.1%
1
7.1%
1
7.1%
1
7.1%
1
7.1%
1
7.1%
1
7.1%
1
7.1%
Other values (2) 2
14.3%
CJK Compat Ideographs
ValueCountFrequency (%)
1
50.0%
1
50.0%
Letterlike Symbols
ValueCountFrequency (%)
1
50.0%
1
50.0%
CJK Ext A
ValueCountFrequency (%)
1
100.0%
Kannada
ValueCountFrequency (%)
1
16.7%
1
16.7%
ಿ 1
16.7%
1
16.7%
1
16.7%
1
16.7%
Math Operators
ValueCountFrequency (%)
1
100.0%
Arrows
ValueCountFrequency (%)
1
100.0%
Armenian
ValueCountFrequency (%)
Տ 1
14.3%
վ 1
14.3%
ի 1
14.3%
թ 1
14.3%
ե 1
14.3%
ր 1
14.3%
ա 1
14.3%
Lao
ValueCountFrequency (%)
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%

overview
Text

MISSING 

Distinct44307
Distinct (%)99.5%
Missing954
Missing (%)2.1%
Memory size17.8 MiB
2024-04-26T20:17:05.994841image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length1000
Median length785
Mean length323.3215537
Min length1

Characters and Unicode

Total characters14391689
Distinct characters429
Distinct categories25 ?
Distinct scripts13 ?
Distinct blocks21 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique44247 ?
Unique (%)99.4%

Sample

1st rowLed by Woody, Andy's toys live happily in his room until Andy's birthday brings Buzz Lightyear onto the scene. Afraid of losing his place in Andy's heart, Woody plots against Buzz. But when circumstances separate Buzz and Woody from their owner, the duo eventually learns to put aside their differences.
2nd rowWhen siblings Judy and Peter discover an enchanted board game that opens the door to a magical world, they unwittingly invite Alan -- an adult who's been trapped inside the game for 26 years -- into their living room. Alan's only hope for freedom is to finish the game, which proves risky as all three find themselves running from giant rhinoceroses, evil monkeys and other terrifying creatures.
3rd rowA family wedding reignites the ancient feud between next-door neighbors and fishing buddies John and Max. Meanwhile, a sultry Italian divorcée opens a restaurant at the local bait shop, alarming the locals who worry she'll scare the fish away. But she's less interested in seafood than she is in cooking up a hot time with Max.
4th rowCheated on, mistreated and stepped on, the women are holding their breath, waiting for the elusive "good man" to break a string of less-than-stellar lovers. Friends and confidants Vannah, Bernie, Glo and Robin talk it all out, determined to find a better way to breathe.
5th rowJust when George Banks has recovered from his daughter's wedding, he receives the news that she's pregnant ... and that George's wife, Nina, is expecting too. He was planning on selling their home, but that's a plan that -- like George -- will have to change with the arrival of both a grandchild and a kid of his own.
ValueCountFrequency (%)
the 138357
 
5.6%
a 99037
 
4.0%
and 75407
 
3.1%
to 73442
 
3.0%
of 69723
 
2.8%
in 48228
 
2.0%
is 36550
 
1.5%
his 36210
 
1.5%
with 23933
 
1.0%
her 21518
 
0.9%
Other values (97181) 1830623
74.6%
2024-04-26T20:17:06.524839image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2410599
16.7%
e 1366183
 
9.5%
a 942278
 
6.5%
t 936476
 
6.5%
i 853105
 
5.9%
o 831419
 
5.8%
n 824147
 
5.7%
s 769188
 
5.3%
r 745638
 
5.2%
h 601821
 
4.2%
Other values (419) 4110835
28.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 11170227
77.6%
Space Separator 2410637
 
16.8%
Uppercase Letter 391751
 
2.7%
Other Punctuation 313382
 
2.2%
Decimal Number 42329
 
0.3%
Dash Punctuation 36848
 
0.3%
Close Punctuation 10112
 
0.1%
Open Punctuation 10090
 
0.1%
Final Punctuation 4560
 
< 0.1%
Initial Punctuation 884
 
< 0.1%
Other values (15) 869
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 1366183
12.2%
a 942278
 
8.4%
t 936476
 
8.4%
i 853105
 
7.6%
o 831419
 
7.4%
n 824147
 
7.4%
s 769188
 
6.9%
r 745638
 
6.7%
h 601821
 
5.4%
l 479703
 
4.3%
Other values (142) 2820269
25.2%
Uppercase Letter
ValueCountFrequency (%)
A 42831
 
10.9%
T 36041
 
9.2%
S 31203
 
8.0%
M 24000
 
6.1%
B 23750
 
6.1%
C 22837
 
5.8%
H 19463
 
5.0%
W 18685
 
4.8%
I 16837
 
4.3%
D 16347
 
4.2%
Other values (77) 139757
35.7%
Other Letter
ValueCountFrequency (%)
6
 
4.8%
6
 
4.8%
5
 
4.0%
4
 
3.2%
3
 
2.4%
3
 
2.4%
3
 
2.4%
3
 
2.4%
2
 
1.6%
2
 
1.6%
Other values (76) 88
70.4%
Other Punctuation
ValueCountFrequency (%)
, 133694
42.7%
. 124991
39.9%
' 31173
 
9.9%
" 11693
 
3.7%
: 3306
 
1.1%
? 2765
 
0.9%
; 2496
 
0.8%
! 1546
 
0.5%
/ 769
 
0.2%
& 455
 
0.1%
Other values (12) 494
 
0.2%
Nonspacing Mark
ValueCountFrequency (%)
ి 4
12.1%
́ 4
12.1%
̈ 3
9.1%
3
9.1%
3
9.1%
3
9.1%
2
 
6.1%
2
 
6.1%
2
 
6.1%
2
 
6.1%
Other values (4) 5
15.2%
Decimal Number
ValueCountFrequency (%)
1 9770
23.1%
0 8292
19.6%
9 6422
15.2%
2 4265
10.1%
5 2446
 
5.8%
8 2384
 
5.6%
3 2346
 
5.5%
4 2181
 
5.2%
7 2135
 
5.0%
6 2088
 
4.9%
Spacing Mark
ValueCountFrequency (%)
11
40.7%
4
 
14.8%
3
 
11.1%
3
 
11.1%
ि 2
 
7.4%
2
 
7.4%
1
 
3.7%
ி 1
 
3.7%
Dash Punctuation
ValueCountFrequency (%)
- 35321
95.9%
885
 
2.4%
633
 
1.7%
5
 
< 0.1%
4
 
< 0.1%
Other Symbol
ValueCountFrequency (%)
® 45
70.3%
14
 
21.9%
° 2
 
3.1%
¦ 2
 
3.1%
1
 
1.6%
Math Symbol
ValueCountFrequency (%)
~ 20
46.5%
+ 12
27.9%
= 6
 
14.0%
| 4
 
9.3%
1
 
2.3%
Open Punctuation
ValueCountFrequency (%)
( 10036
99.5%
[ 51
 
0.5%
{ 2
 
< 0.1%
1
 
< 0.1%
Currency Symbol
ValueCountFrequency (%)
$ 318
96.4%
£ 10
 
3.0%
1
 
0.3%
1
 
0.3%
Space Separator
ValueCountFrequency (%)
2410599
> 99.9%
  36
 
< 0.1%
  2
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
) 10060
99.5%
] 50
 
0.5%
} 2
 
< 0.1%
Final Punctuation
ValueCountFrequency (%)
3850
84.4%
691
 
15.2%
» 19
 
0.4%
Initial Punctuation
ValueCountFrequency (%)
673
76.1%
193
 
21.8%
« 18
 
2.0%
Control
ValueCountFrequency (%)
106
96.4%
’ 3
 
2.7%
 1
 
0.9%
Modifier Symbol
ValueCountFrequency (%)
´ 25
65.8%
` 12
31.6%
¯ 1
 
2.6%
Format
ValueCountFrequency (%)
31
60.8%
­ 20
39.2%
Other Number
ValueCountFrequency (%)
½ 8
50.0%
¹ 8
50.0%
Connector Punctuation
ValueCountFrequency (%)
_ 19
100.0%
Line Separator
ValueCountFrequency (%)
7
100.0%
Paragraph Separator
ValueCountFrequency (%)
2
100.0%
Modifier Letter
ValueCountFrequency (%)
ʼ 2
100.0%
Letter Number
ValueCountFrequency (%)
2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 11556746
80.3%
Common 2829524
 
19.7%
Cyrillic 4587
 
< 0.1%
Greek 648
 
< 0.1%
Devanagari 77
 
< 0.1%
Telugu 30
 
< 0.1%
Hiragana 20
 
< 0.1%
Tamil 19
 
< 0.1%
Han 10
 
< 0.1%
Hangul 9
 
< 0.1%
Other values (3) 19
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 1366183
11.8%
a 942278
 
8.2%
t 936476
 
8.1%
i 853105
 
7.4%
o 831419
 
7.2%
n 824147
 
7.1%
s 769188
 
6.7%
r 745638
 
6.5%
h 601821
 
5.2%
l 479703
 
4.2%
Other values (132) 3206788
27.7%
Common
ValueCountFrequency (%)
2410599
85.2%
, 133694
 
4.7%
. 124991
 
4.4%
- 35321
 
1.2%
' 31173
 
1.1%
" 11693
 
0.4%
) 10060
 
0.4%
( 10036
 
0.4%
1 9770
 
0.3%
0 8292
 
0.3%
Other values (71) 43895
 
1.6%
Cyrillic
ValueCountFrequency (%)
о 470
 
10.2%
е 404
 
8.8%
а 373
 
8.1%
н 323
 
7.0%
и 299
 
6.5%
т 265
 
5.8%
р 240
 
5.2%
с 218
 
4.8%
в 173
 
3.8%
л 161
 
3.5%
Other values (46) 1661
36.2%
Greek
ValueCountFrequency (%)
α 60
 
9.3%
ο 55
 
8.5%
τ 43
 
6.6%
η 36
 
5.6%
ι 36
 
5.6%
ν 34
 
5.2%
ε 31
 
4.8%
ρ 31
 
4.8%
π 30
 
4.6%
ς 30
 
4.6%
Other values (33) 262
40.4%
Devanagari
ValueCountFrequency (%)
11
 
14.3%
6
 
7.8%
6
 
7.8%
5
 
6.5%
4
 
5.2%
3
 
3.9%
3
 
3.9%
3
 
3.9%
3
 
3.9%
3
 
3.9%
Other values (21) 30
39.0%
Hiragana
ValueCountFrequency (%)
4
20.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
Other values (7) 7
35.0%
Telugu
ValueCountFrequency (%)
ి 4
13.3%
3
10.0%
3
10.0%
3
10.0%
2
 
6.7%
2
 
6.7%
2
 
6.7%
2
 
6.7%
2
 
6.7%
1
 
3.3%
Other values (6) 6
20.0%
Tamil
ValueCountFrequency (%)
3
15.8%
2
10.5%
2
10.5%
2
10.5%
2
10.5%
1
 
5.3%
1
 
5.3%
1
 
5.3%
1
 
5.3%
1
 
5.3%
Other values (3) 3
15.8%
Han
ValueCountFrequency (%)
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
Hangul
ValueCountFrequency (%)
2
22.2%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
Thai
ValueCountFrequency (%)
2
25.0%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
Arabic
ValueCountFrequency (%)
م 2
50.0%
ہ 1
25.0%
ت 1
25.0%
Inherited
ValueCountFrequency (%)
́ 4
57.1%
̈ 3
42.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 14373677
99.9%
Punctuation 7281
 
0.1%
None 5933
 
< 0.1%
Cyrillic 4587
 
< 0.1%
Devanagari 77
 
< 0.1%
Telugu 30
 
< 0.1%
Hiragana 20
 
< 0.1%
Tamil 19
 
< 0.1%
Letterlike Symbols 14
 
< 0.1%
CJK 10
 
< 0.1%
Other values (11) 41
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2410599
16.8%
e 1366183
 
9.5%
a 942278
 
6.6%
t 936476
 
6.5%
i 853105
 
5.9%
o 831419
 
5.8%
n 824147
 
5.7%
s 769188
 
5.4%
r 745638
 
5.2%
h 601821
 
4.2%
Other values (82) 4092823
28.5%
Punctuation
ValueCountFrequency (%)
3850
52.9%
885
 
12.2%
691
 
9.5%
673
 
9.2%
633
 
8.7%
304
 
4.2%
193
 
2.7%
31
 
0.4%
7
 
0.1%
5
 
0.1%
Other values (4) 9
 
0.1%
None
ValueCountFrequency (%)
é 1552
26.2%
ä 294
 
5.0%
á 293
 
4.9%
ö 250
 
4.2%
í 244
 
4.1%
è 209
 
3.5%
ü 178
 
3.0%
ı 165
 
2.8%
ó 164
 
2.8%
ç 158
 
2.7%
Other values (141) 2426
40.9%
Cyrillic
ValueCountFrequency (%)
о 470
 
10.2%
е 404
 
8.8%
а 373
 
8.1%
н 323
 
7.0%
и 299
 
6.5%
т 265
 
5.8%
р 240
 
5.2%
с 218
 
4.8%
в 173
 
3.8%
л 161
 
3.5%
Other values (46) 1661
36.2%
Letterlike Symbols
ValueCountFrequency (%)
14
100.0%
Devanagari
ValueCountFrequency (%)
11
 
14.3%
6
 
7.8%
6
 
7.8%
5
 
6.5%
4
 
5.2%
3
 
3.9%
3
 
3.9%
3
 
3.9%
3
 
3.9%
3
 
3.9%
Other values (21) 30
39.0%
Hiragana
ValueCountFrequency (%)
4
20.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
Other values (7) 7
35.0%
Telugu
ValueCountFrequency (%)
ి 4
13.3%
3
10.0%
3
10.0%
3
10.0%
2
 
6.7%
2
 
6.7%
2
 
6.7%
2
 
6.7%
2
 
6.7%
1
 
3.3%
Other values (6) 6
20.0%
Diacriticals
ValueCountFrequency (%)
́ 4
57.1%
̈ 3
42.9%
Alphabetic PF
ValueCountFrequency (%)
4
100.0%
Tamil
ValueCountFrequency (%)
3
15.8%
2
10.5%
2
10.5%
2
10.5%
2
10.5%
1
 
5.3%
1
 
5.3%
1
 
5.3%
1
 
5.3%
1
 
5.3%
Other values (3) 3
15.8%
Hangul
ValueCountFrequency (%)
2
22.2%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
Arabic
ValueCountFrequency (%)
م 2
50.0%
ہ 1
25.0%
ت 1
25.0%
Thai
ValueCountFrequency (%)
2
25.0%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
Modifier Letters
ValueCountFrequency (%)
ʼ 2
100.0%
Number Forms
ValueCountFrequency (%)
2
100.0%
CJK
ValueCountFrequency (%)
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
Math Operators
ValueCountFrequency (%)
1
100.0%
Katakana
ValueCountFrequency (%)
1
100.0%
Currency Symbols
ValueCountFrequency (%)
1
50.0%
1
50.0%
Specials
ValueCountFrequency (%)
1
100.0%

popularity
Unsupported

REJECTED  UNSUPPORTED 

Missing5
Missing (%)< 0.1%
Memory size1.8 MiB
Distinct45024
Distinct (%)99.9%
Missing386
Missing (%)0.8%
Memory size3.8 MiB
2024-04-26T20:17:06.776356image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length35
Median length32
Mean length31.97162822
Min length12

Characters and Unicode

Total characters1441281
Distinct characters66
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique44977 ?
Unique (%)99.8%

Sample

1st row/rhIRbceoE9lR4veEXuwCC2wARtG.jpg
2nd row/vzmL6fP7aPKNKPRTFnZmiUfciyV.jpg
3rd row/6ksm1sjKMFLbO7UY2i6G1ju9SML.jpg
4th row/16XOMpEaLWkrcPqSQqhTmeJuqQl.jpg
5th row/e64sOI48hQXyru7naBFyssKFxVd.jpg
ValueCountFrequency (%)
5d7ubsegdyone6lql6xs7s6olcw.jpg 5
 
< 0.1%
qw1oqlohizrhxzqrpkimyr0oxzn.jpg 4
 
< 0.1%
2kslzxoaw0hmnguvpcnqlcdxfr9.jpg 4
 
< 0.1%
cdwvc18urfedqjjxqjyrmogdc0h.jpg 3
 
< 0.1%
8vsz9coczxocw2we2qene1h1fko.jpg 3
 
< 0.1%
bql0pvhbq8jmw3njcl38kw0coem.jpg 2
 
< 0.1%
w56oo9nrecf54snxvyue9qxzfjt.jpg 2
 
< 0.1%
xue1ilucohbxmy0fiqktt6d013n.jpg 2
 
< 0.1%
g21ruzz3bzeudukmb82kejjtufk.jpg 2
 
< 0.1%
iqd7zwhsece3cgdpclidxjgfdzl.jpg 2
 
< 0.1%
Other values (45020) 45057
99.9%
2024-04-26T20:17:07.127385image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
g 65293
 
4.5%
p 65148
 
4.5%
j 65043
 
4.5%
/ 45077
 
3.1%
. 45077
 
3.1%
v 20444
 
1.4%
d 20329
 
1.4%
m 20322
 
1.4%
q 20256
 
1.4%
t 20248
 
1.4%
Other values (56) 1054044
73.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 659135
45.7%
Uppercase Letter 492145
34.1%
Decimal Number 199840
 
13.9%
Other Punctuation 90155
 
6.3%
Space Separator 6
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
g 65293
 
9.9%
p 65148
 
9.9%
j 65043
 
9.9%
v 20444
 
3.1%
d 20329
 
3.1%
m 20322
 
3.1%
q 20256
 
3.1%
t 20248
 
3.1%
n 20233
 
3.1%
l 20232
 
3.1%
Other values (16) 321587
48.8%
Uppercase Letter
ValueCountFrequency (%)
A 19398
 
3.9%
R 19191
 
3.9%
M 19170
 
3.9%
C 19151
 
3.9%
W 19140
 
3.9%
V 19138
 
3.9%
T 18983
 
3.9%
K 18965
 
3.9%
L 18955
 
3.9%
D 18953
 
3.9%
Other values (16) 301101
61.2%
Decimal Number
ValueCountFrequency (%)
1 20218
10.1%
8 20217
10.1%
3 20157
10.1%
9 20106
10.1%
5 20101
10.1%
2 20051
10.0%
6 20009
10.0%
4 20007
10.0%
7 19898
10.0%
0 19076
9.5%
Other Punctuation
ValueCountFrequency (%)
/ 45077
50.0%
. 45077
50.0%
: 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
6
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1151280
79.9%
Common 290001
 
20.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
g 65293
 
5.7%
p 65148
 
5.7%
j 65043
 
5.6%
v 20444
 
1.8%
d 20329
 
1.8%
m 20322
 
1.8%
q 20256
 
1.8%
t 20248
 
1.8%
n 20233
 
1.8%
l 20232
 
1.8%
Other values (42) 813732
70.7%
Common
ValueCountFrequency (%)
/ 45077
15.5%
. 45077
15.5%
1 20218
7.0%
8 20217
7.0%
3 20157
7.0%
9 20106
6.9%
5 20101
6.9%
2 20051
6.9%
6 20009
6.9%
4 20007
6.9%
Other values (4) 38981
13.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1441281
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
g 65293
 
4.5%
p 65148
 
4.5%
j 65043
 
4.5%
/ 45077
 
3.1%
. 45077
 
3.1%
v 20444
 
1.4%
d 20329
 
1.4%
m 20322
 
1.4%
q 20256
 
1.4%
t 20248
 
1.4%
Other values (56) 1054044
73.1%
Distinct22708
Distinct (%)49.9%
Missing3
Missing (%)< 0.1%
Memory size5.6 MiB
2024-04-26T20:17:07.517382image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length1252
Median length954
Mean length70.09882762
Min length2

Characters and Unicode

Total characters3186903
Distinct characters293
Distinct categories15 ?
Distinct scripts6 ?
Distinct blocks6 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique20344 ?
Unique (%)44.7%

Sample

1st row[{'name': 'Pixar Animation Studios', 'id': 3}]
2nd row[{'name': 'TriStar Pictures', 'id': 559}, {'name': 'Teitler Film', 'id': 2550}, {'name': 'Interscope Communications', 'id': 10201}]
3rd row[{'name': 'Warner Bros.', 'id': 6194}, {'name': 'Lancaster Gate', 'id': 19464}]
4th row[{'name': 'Twentieth Century Fox Film Corporation', 'id': 306}]
5th row[{'name': 'Sandollar Productions', 'id': 5842}, {'name': 'Touchstone Pictures', 'id': 9195}]
ValueCountFrequency (%)
id 70546
 
17.6%
name 70546
 
17.6%
12719
 
3.2%
films 9457
 
2.4%
pictures 9267
 
2.3%
productions 9061
 
2.3%
film 6679
 
1.7%
entertainment 5156
 
1.3%
corporation 2190
 
0.5%
company 1769
 
0.4%
Other values (42195) 203834
50.8%
2024-04-26T20:17:08.065850image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
' 422867
 
13.3%
355774
 
11.2%
i 177505
 
5.6%
e 165212
 
5.2%
n 160535
 
5.0%
a 147709
 
4.6%
: 141099
 
4.4%
m 114830
 
3.6%
, 107909
 
3.4%
d 104017
 
3.3%
Other values (283) 1289446
40.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1410509
44.3%
Other Punctuation 680026
21.3%
Space Separator 355774
 
11.2%
Decimal Number 295745
 
9.3%
Uppercase Letter 199007
 
6.2%
Open Punctuation 120335
 
3.8%
Close Punctuation 120334
 
3.8%
Dash Punctuation 4331
 
0.1%
Math Symbol 662
 
< 0.1%
Other Letter 140
 
< 0.1%
Other values (5) 40
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 177505
12.6%
e 165212
11.7%
n 160535
11.4%
a 147709
10.5%
m 114830
8.1%
d 104017
7.4%
o 85310
 
6.0%
r 83560
 
5.9%
t 83459
 
5.9%
s 62684
 
4.4%
Other values (102) 225688
16.0%
Other Letter
ValueCountFrequency (%)
9
 
6.4%
8
 
5.7%
6
 
4.3%
5
 
3.6%
5
 
3.6%
5
 
3.6%
5
 
3.6%
5
 
3.6%
4
 
2.9%
3
 
2.1%
Other values (62) 85
60.7%
Uppercase Letter
ValueCountFrequency (%)
P 27882
14.0%
F 26367
13.2%
C 20589
 
10.3%
M 13363
 
6.7%
S 11914
 
6.0%
E 9750
 
4.9%
A 9550
 
4.8%
T 9357
 
4.7%
B 9006
 
4.5%
G 7812
 
3.9%
Other values (52) 53417
26.8%
Other Punctuation
ValueCountFrequency (%)
' 422867
62.2%
: 141099
 
20.7%
, 107909
 
15.9%
. 5671
 
0.8%
" 987
 
0.1%
& 765
 
0.1%
/ 645
 
0.1%
! 36
 
< 0.1%
% 18
 
< 0.1%
\ 12
 
< 0.1%
Other values (6) 17
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
1 45079
15.2%
2 33554
11.3%
3 31849
10.8%
4 30685
10.4%
6 28094
9.5%
5 27816
9.4%
8 25853
8.7%
7 24553
8.3%
9 24362
8.2%
0 23900
8.1%
Close Punctuation
ValueCountFrequency (%)
} 70545
58.6%
] 45469
37.8%
) 4319
 
3.6%
1
 
< 0.1%
Open Punctuation
ValueCountFrequency (%)
{ 70545
58.6%
[ 45469
37.8%
( 4320
 
3.6%
1
 
< 0.1%
Dash Punctuation
ValueCountFrequency (%)
- 4329
> 99.9%
2
 
< 0.1%
Math Symbol
ValueCountFrequency (%)
+ 661
99.8%
| 1
 
0.2%
Other Symbol
ValueCountFrequency (%)
° 23
92.0%
2
 
8.0%
Final Punctuation
ValueCountFrequency (%)
3
50.0%
» 3
50.0%
Other Number
ValueCountFrequency (%)
½ 1
50.0%
² 1
50.0%
Space Separator
ValueCountFrequency (%)
355774
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 4
100.0%
Initial Punctuation
ValueCountFrequency (%)
« 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1609113
50.5%
Common 1577245
49.5%
Cyrillic 373
 
< 0.1%
Hangul 115
 
< 0.1%
Greek 31
 
< 0.1%
Han 26
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 177505
11.0%
e 165212
 
10.3%
n 160535
 
10.0%
a 147709
 
9.2%
m 114830
 
7.1%
d 104017
 
6.5%
o 85310
 
5.3%
r 83560
 
5.2%
t 83459
 
5.2%
s 62684
 
3.9%
Other values (99) 424292
26.4%
Hangul
ValueCountFrequency (%)
9
 
7.8%
8
 
7.0%
6
 
5.2%
5
 
4.3%
5
 
4.3%
5
 
4.3%
5
 
4.3%
5
 
4.3%
4
 
3.5%
3
 
2.6%
Other values (43) 60
52.2%
Common
ValueCountFrequency (%)
' 422867
26.8%
355774
22.6%
: 141099
 
8.9%
, 107909
 
6.8%
} 70545
 
4.5%
{ 70545
 
4.5%
[ 45469
 
2.9%
] 45469
 
2.9%
1 45079
 
2.9%
2 33554
 
2.1%
Other values (36) 238935
15.1%
Cyrillic
ValueCountFrequency (%)
и 34
 
9.1%
о 28
 
7.5%
а 26
 
7.0%
л 22
 
5.9%
н 20
 
5.4%
м 19
 
5.1%
т 17
 
4.6%
ь 16
 
4.3%
с 16
 
4.3%
е 16
 
4.3%
Other values (36) 159
42.6%
Greek
ValueCountFrequency (%)
ο 3
 
9.7%
ν 3
 
9.7%
η 2
 
6.5%
λ 2
 
6.5%
Ε 2
 
6.5%
ι 2
 
6.5%
ρ 2
 
6.5%
τ 2
 
6.5%
Κ 2
 
6.5%
ό 1
 
3.2%
Other values (10) 10
32.3%
Han
ValueCountFrequency (%)
2
 
7.7%
2
 
7.7%
2
 
7.7%
2
 
7.7%
2
 
7.7%
2
 
7.7%
2
 
7.7%
1
 
3.8%
1
 
3.8%
1
 
3.8%
Other values (9) 9
34.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 3180679
99.8%
None 5706
 
0.2%
Cyrillic 373
 
< 0.1%
Hangul 113
 
< 0.1%
CJK 26
 
< 0.1%
Punctuation 6
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
' 422867
 
13.3%
355774
 
11.2%
i 177505
 
5.6%
e 165212
 
5.2%
n 160535
 
5.0%
a 147709
 
4.6%
: 141099
 
4.4%
m 114830
 
3.6%
, 107909
 
3.4%
d 104017
 
3.3%
Other values (78) 1283222
40.3%
None
ValueCountFrequency (%)
é 3176
55.7%
ó 416
 
7.3%
á 317
 
5.6%
í 173
 
3.0%
ü 154
 
2.7%
ñ 150
 
2.6%
ô 140
 
2.5%
ä 137
 
2.4%
è 136
 
2.4%
ö 132
 
2.3%
Other values (75) 775
 
13.6%
Cyrillic
ValueCountFrequency (%)
и 34
 
9.1%
о 28
 
7.5%
а 26
 
7.0%
л 22
 
5.9%
н 20
 
5.4%
м 19
 
5.1%
т 17
 
4.6%
ь 16
 
4.3%
с 16
 
4.3%
е 16
 
4.3%
Other values (36) 159
42.6%
Hangul
ValueCountFrequency (%)
9
 
8.0%
8
 
7.1%
6
 
5.3%
5
 
4.4%
5
 
4.4%
5
 
4.4%
5
 
4.4%
5
 
4.4%
4
 
3.5%
3
 
2.7%
Other values (42) 58
51.3%
Punctuation
ValueCountFrequency (%)
3
50.0%
2
33.3%
1
 
16.7%
CJK
ValueCountFrequency (%)
2
 
7.7%
2
 
7.7%
2
 
7.7%
2
 
7.7%
2
 
7.7%
2
 
7.7%
2
 
7.7%
1
 
3.8%
1
 
3.8%
1
 
3.8%
Other values (9) 9
34.6%
Distinct2393
Distinct (%)5.3%
Missing3
Missing (%)< 0.1%
Memory size4.8 MiB
2024-04-26T20:17:08.411850image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length1039
Median length649
Mean length53.20049271
Min length2

Characters and Unicode

Total characters2418654
Distinct characters69
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1768 ?
Unique (%)3.9%

Sample

1st row[{'iso_3166_1': 'US', 'name': 'United States of America'}]
2nd row[{'iso_3166_1': 'US', 'name': 'United States of America'}]
3rd row[{'iso_3166_1': 'US', 'name': 'United States of America'}]
4th row[{'iso_3166_1': 'US', 'name': 'United States of America'}]
5th row[{'iso_3166_1': 'US', 'name': 'United States of America'}]
ValueCountFrequency (%)
iso_3166_1 49423
18.1%
name 49423
18.1%
united 25275
9.2%
states 21154
7.7%
of 21153
7.7%
america 21153
7.7%
us 21153
7.7%
6282
 
2.3%
gb 4094
 
1.5%
kingdom 4094
 
1.5%
Other values (344) 50140
18.3%
2024-04-26T20:17:08.910367image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
' 395379
16.3%
227881
 
9.4%
e 130095
 
5.4%
a 119929
 
5.0%
i 107991
 
4.5%
6 98847
 
4.1%
_ 98846
 
4.1%
1 98846
 
4.1%
: 98846
 
4.1%
n 96933
 
4.0%
Other values (59) 945061
39.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 904689
37.4%
Other Punctuation 553906
22.9%
Decimal Number 247121
 
10.2%
Space Separator 227881
 
9.4%
Uppercase Letter 196445
 
8.1%
Connector Punctuation 98846
 
4.1%
Open Punctuation 94883
 
3.9%
Close Punctuation 94883
 
3.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 130095
14.4%
a 119929
13.3%
i 107991
11.9%
n 96933
10.7%
o 79012
8.7%
m 78136
8.6%
s 74119
8.2%
t 72641
8.0%
d 34558
 
3.8%
r 32498
 
3.6%
Other values (16) 78777
8.7%
Uppercase Letter
ValueCountFrequency (%)
U 48407
24.6%
S 46889
23.9%
A 25531
13.0%
F 8678
 
4.4%
R 7997
 
4.1%
I 7601
 
3.9%
G 6924
 
3.5%
K 6810
 
3.5%
B 5860
 
3.0%
C 5367
 
2.7%
Other values (16) 26381
13.4%
Decimal Number
ValueCountFrequency (%)
6 98847
40.0%
1 98846
40.0%
3 49424
20.0%
0 2
 
< 0.1%
7 1
 
< 0.1%
4 1
 
< 0.1%
Other Punctuation
ValueCountFrequency (%)
' 395379
71.4%
: 98846
 
17.8%
, 59668
 
10.8%
" 10
 
< 0.1%
. 3
 
< 0.1%
Open Punctuation
ValueCountFrequency (%)
{ 49423
52.1%
[ 45460
47.9%
Close Punctuation
ValueCountFrequency (%)
} 49423
52.1%
] 45460
47.9%
Space Separator
ValueCountFrequency (%)
227881
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 98846
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1317520
54.5%
Latin 1101134
45.5%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 130095
11.8%
a 119929
10.9%
i 107991
9.8%
n 96933
 
8.8%
o 79012
 
7.2%
m 78136
 
7.1%
s 74119
 
6.7%
t 72641
 
6.6%
U 48407
 
4.4%
S 46889
 
4.3%
Other values (42) 246982
22.4%
Common
ValueCountFrequency (%)
' 395379
30.0%
227881
17.3%
6 98847
 
7.5%
_ 98846
 
7.5%
1 98846
 
7.5%
: 98846
 
7.5%
, 59668
 
4.5%
3 49424
 
3.8%
{ 49423
 
3.8%
} 49423
 
3.8%
Other values (7) 90937
 
6.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2418654
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
' 395379
16.3%
227881
 
9.4%
e 130095
 
5.4%
a 119929
 
5.0%
i 107991
 
4.5%
6 98847
 
4.1%
_ 98846
 
4.1%
1 98846
 
4.1%
: 98846
 
4.1%
n 96933
 
4.0%
Other values (59) 945061
39.1%
Distinct17336
Distinct (%)38.2%
Missing87
Missing (%)0.2%
Memory size2.9 MiB
2024-04-26T20:17:09.230369image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length10
Median length10
Mean length9.999449084
Min length1

Characters and Unicode

Total characters453765
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique8573 ?
Unique (%)18.9%

Sample

1st row1995-10-30
2nd row1995-12-15
3rd row1995-12-22
4th row1995-12-22
5th row1995-02-10
ValueCountFrequency (%)
2008-01-01 136
 
0.3%
2009-01-01 121
 
0.3%
2007-01-01 118
 
0.3%
2005-01-01 111
 
0.2%
2006-01-01 101
 
0.2%
2002-01-01 96
 
0.2%
2004-01-01 90
 
0.2%
2001-01-01 84
 
0.2%
2003-01-01 76
 
0.2%
1997-01-01 69
 
0.2%
Other values (17326) 44377
97.8%
2024-04-26T20:17:09.645879image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 97600
21.5%
- 90752
20.0%
1 84056
18.5%
2 52806
11.6%
9 39773
8.8%
3 15435
 
3.4%
8 15279
 
3.4%
6 15021
 
3.3%
5 14836
 
3.3%
7 14289
 
3.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 363013
80.0%
Dash Punctuation 90752
 
20.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 97600
26.9%
1 84056
23.2%
2 52806
14.5%
9 39773
11.0%
3 15435
 
4.3%
8 15279
 
4.2%
6 15021
 
4.1%
5 14836
 
4.1%
7 14289
 
3.9%
4 13918
 
3.8%
Dash Punctuation
ValueCountFrequency (%)
- 90752
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 453765
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 97600
21.5%
- 90752
20.0%
1 84056
18.5%
2 52806
11.6%
9 39773
8.8%
3 15435
 
3.4%
8 15279
 
3.4%
6 15021
 
3.3%
5 14836
 
3.3%
7 14289
 
3.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 453765
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 97600
21.5%
- 90752
20.0%
1 84056
18.5%
2 52806
11.6%
9 39773
8.8%
3 15435
 
3.4%
8 15279
 
3.4%
6 15021
 
3.3%
5 14836
 
3.3%
7 14289
 
3.1%

revenue
Real number (ℝ)

ZEROS 

Distinct6863
Distinct (%)15.1%
Missing6
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean11209348.54
Minimum0
Maximum2787965087
Zeros38052
Zeros (%)83.7%
Negative0
Negative (%)0.0%
Memory size355.3 KiB
2024-04-26T20:17:09.773881image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile47808918.5
Maximum2787965087
Range2787965087
Interquartile range (IQR)0

Descriptive statistics

Standard deviation64332246.74
Coefficient of variation (CV)5.739160176
Kurtosis237.5105858
Mean11209348.54
Median Absolute Deviation (MAD)0
Skewness12.26598291
Sum5.095769846 × 1011
Variance4.138637971 × 1015
MonotonicityNot monotonic
2024-04-26T20:17:10.075430image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 38052
83.7%
12000000 20
 
< 0.1%
10000000 19
 
< 0.1%
11000000 19
 
< 0.1%
2000000 18
 
< 0.1%
6000000 17
 
< 0.1%
5000000 14
 
< 0.1%
500000 13
 
< 0.1%
8000000 13
 
< 0.1%
1 12
 
< 0.1%
Other values (6853) 7263
 
16.0%
ValueCountFrequency (%)
0 38052
83.7%
1 12
 
< 0.1%
2 3
 
< 0.1%
3 9
 
< 0.1%
4 4
 
< 0.1%
ValueCountFrequency (%)
2787965087 1
< 0.1%
2068223624 1
< 0.1%
1845034188 1
< 0.1%
1519557910 1
< 0.1%
1513528810 1
< 0.1%

runtime
Real number (ℝ)

ZEROS 

Distinct353
Distinct (%)0.8%
Missing263
Missing (%)0.6%
Infinite0
Infinite (%)0.0%
Mean94.12819946
Minimum0
Maximum1256
Zeros1558
Zeros (%)3.4%
Negative0
Negative (%)0.0%
Memory size355.3 KiB
2024-04-26T20:17:10.199430image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile11
Q185
median95
Q3107
95-th percentile138
Maximum1256
Range1256
Interquartile range (IQR)22

Descriptive statistics

Standard deviation38.40781049
Coefficient of variation (CV)0.4080372376
Kurtosis93.21715769
Mean94.12819946
Median Absolute Deviation (MAD)11
Skewness4.465957935
Sum4254877
Variance1475.159906
MonotonicityNot monotonic
2024-04-26T20:17:10.330431image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
90 2556
 
5.6%
0 1558
 
3.4%
100 1470
 
3.2%
95 1412
 
3.1%
93 1214
 
2.7%
96 1104
 
2.4%
92 1080
 
2.4%
94 1062
 
2.3%
91 1057
 
2.3%
88 1032
 
2.3%
Other values (343) 31658
69.6%
ValueCountFrequency (%)
0 1558
3.4%
1 107
 
0.2%
2 33
 
0.1%
3 48
 
0.1%
4 51
 
0.1%
ValueCountFrequency (%)
1256 1
< 0.1%
1140 2
< 0.1%
931 1
< 0.1%
925 1
< 0.1%
900 1
< 0.1%
Distinct1931
Distinct (%)4.2%
Missing6
Missing (%)< 0.1%
Memory size5.3 MiB
2024-04-26T20:17:10.681947image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length765
Median length40
Mean length46.92828861
Min length2

Characters and Unicode

Total characters2133360
Distinct characters184
Distinct categories11 ?
Distinct scripts15 ?
Distinct blocks16 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1366 ?
Unique (%)3.0%

Sample

1st row[{'iso_639_1': 'en', 'name': 'English'}]
2nd row[{'iso_639_1': 'en', 'name': 'English'}, {'iso_639_1': 'fr', 'name': 'Français'}]
3rd row[{'iso_639_1': 'en', 'name': 'English'}]
4th row[{'iso_639_1': 'en', 'name': 'English'}]
5th row[{'iso_639_1': 'en', 'name': 'English'}]
ValueCountFrequency (%)
iso_639_1 53300
24.4%
name 53300
24.4%
english 28745
13.2%
en 28745
13.2%
4809
 
2.2%
fr 4196
 
1.9%
français 4196
 
1.9%
deutsch 2625
 
1.2%
de 2625
 
1.2%
es 2413
 
1.1%
Other values (203) 33488
15.3%
2024-04-26T20:17:11.199949image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
' 426400
20.0%
172982
 
8.1%
n 120605
 
5.7%
_ 106600
 
5.0%
: 106600
 
5.0%
s 99222
 
4.7%
i 94120
 
4.4%
e 92748
 
4.3%
a 75235
 
3.5%
, 64969
 
3.0%
Other values (174) 773879
36.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 771936
36.2%
Other Punctuation 599060
28.1%
Decimal Number 213226
 
10.0%
Space Separator 172982
 
8.1%
Connector Punctuation 106600
 
5.0%
Close Punctuation 98760
 
4.6%
Open Punctuation 98760
 
4.6%
Uppercase Letter 46453
 
2.2%
Other Letter 22196
 
1.0%
Spacing Mark 1838
 
0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 120605
15.6%
s 99222
12.9%
i 94120
12.2%
e 92748
12.0%
a 75235
9.7%
o 61255
7.9%
m 54012
7.0%
l 36054
 
4.7%
h 33831
 
4.4%
g 30529
 
4.0%
Other values (65) 74325
9.6%
Other Letter
ValueCountFrequency (%)
1758
 
7.9%
1758
 
7.9%
1758
 
7.9%
1263
 
5.7%
946
 
4.3%
790
 
3.6%
790
 
3.6%
707
 
3.2%
707
 
3.2%
707
 
3.2%
Other values (46) 11012
49.6%
Uppercase Letter
ValueCountFrequency (%)
E 31215
67.2%
F 4198
 
9.0%
D 2927
 
6.3%
P 2678
 
5.8%
I 2367
 
5.1%
N 830
 
1.8%
L 506
 
1.1%
M 363
 
0.8%
T 308
 
0.7%
Č 284
 
0.6%
Other values (13) 777
 
1.7%
Spacing Mark
ValueCountFrequency (%)
707
38.5%
ि 707
38.5%
136
 
7.4%
ி 111
 
6.0%
94
 
5.1%
47
 
2.6%
18
 
1.0%
18
 
1.0%
Other Punctuation
ValueCountFrequency (%)
' 426400
71.2%
: 106600
 
17.8%
, 64969
 
10.8%
/ 1015
 
0.2%
? 50
 
< 0.1%
\ 26
 
< 0.1%
Nonspacing Mark
ValueCountFrequency (%)
707
45.6%
ִ 430
27.8%
ְ 215
 
13.9%
111
 
7.2%
68
 
4.4%
18
 
1.2%
Decimal Number
ValueCountFrequency (%)
9 53326
25.0%
3 53300
25.0%
6 53300
25.0%
1 53300
25.0%
Close Punctuation
ValueCountFrequency (%)
} 53300
54.0%
] 45460
46.0%
Open Punctuation
ValueCountFrequency (%)
{ 53300
54.0%
[ 45460
46.0%
Space Separator
ValueCountFrequency (%)
172982
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 106600
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1289388
60.4%
Latin 805994
37.8%
Han 10482
 
0.5%
Cyrillic 10460
 
0.5%
Devanagari 4242
 
0.2%
Arabic 3349
 
0.2%
Hangul 3252
 
0.2%
Hebrew 1720
 
0.1%
Greek 1704
 
0.1%
Thai 1232
 
0.1%
Other values (5) 1537
 
0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 120605
15.0%
s 99222
12.3%
i 94120
11.7%
e 92748
11.5%
a 75235
9.3%
o 61255
7.6%
m 54012
6.7%
l 36054
 
4.5%
h 33831
 
4.2%
E 31215
 
3.9%
Other values (52) 107697
13.4%
Cyrillic
ValueCountFrequency (%)
с 3213
30.7%
к 1735
16.6%
и 1680
16.1%
й 1616
15.4%
у 1565
15.0%
а 113
 
1.1%
р 87
 
0.8%
У 53
 
0.5%
ї 53
 
0.5%
н 53
 
0.5%
Other values (12) 292
 
2.8%
Common
ValueCountFrequency (%)
' 426400
33.1%
172982
13.4%
_ 106600
 
8.3%
: 106600
 
8.3%
, 64969
 
5.0%
9 53326
 
4.1%
} 53300
 
4.1%
{ 53300
 
4.1%
3 53300
 
4.1%
6 53300
 
4.1%
Other values (6) 145311
 
11.3%
Arabic
ValueCountFrequency (%)
ا 538
16.1%
ر 538
16.1%
ي 341
10.2%
ب 341
10.2%
ع 341
10.2%
ل 341
10.2%
ة 341
10.2%
ف 142
 
4.2%
س 142
 
4.2%
ی 142
 
4.2%
Other values (5) 142
 
4.2%
Han
ValueCountFrequency (%)
1758
16.8%
1758
16.8%
1758
16.8%
1263
12.0%
946
9.0%
790
7.5%
790
7.5%
473
 
4.5%
广 473
 
4.5%
473
 
4.5%
Hebrew
ValueCountFrequency (%)
ִ 430
25.0%
ת 215
12.5%
ע 215
12.5%
ר 215
12.5%
י 215
12.5%
ְ 215
12.5%
ב 215
12.5%
Greek
ValueCountFrequency (%)
λ 426
25.0%
ν 213
12.5%
ά 213
12.5%
κ 213
12.5%
η 213
12.5%
ε 213
12.5%
ι 213
12.5%
Georgian
ValueCountFrequency (%)
33
14.3%
33
14.3%
33
14.3%
33
14.3%
33
14.3%
33
14.3%
33
14.3%
Devanagari
ValueCountFrequency (%)
707
16.7%
707
16.7%
707
16.7%
707
16.7%
707
16.7%
ि 707
16.7%
Hangul
ValueCountFrequency (%)
542
16.7%
542
16.7%
542
16.7%
542
16.7%
542
16.7%
542
16.7%
Thai
ValueCountFrequency (%)
352
28.6%
176
14.3%
176
14.3%
176
14.3%
176
14.3%
176
14.3%
Gurmukhi
ValueCountFrequency (%)
18
16.7%
18
16.7%
18
16.7%
18
16.7%
18
16.7%
18
16.7%
Telugu
ValueCountFrequency (%)
136
33.3%
68
16.7%
68
16.7%
68
16.7%
68
16.7%
Tamil
ValueCountFrequency (%)
111
20.0%
111
20.0%
111
20.0%
ி 111
20.0%
111
20.0%
Bengali
ValueCountFrequency (%)
94
40.0%
47
20.0%
47
20.0%
47
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2086548
97.8%
CJK 10482
 
0.5%
Cyrillic 10460
 
0.5%
None 10412
 
0.5%
Devanagari 4242
 
0.2%
Arabic 3349
 
0.2%
Hangul 3252
 
0.2%
Hebrew 1720
 
0.1%
Thai 1232
 
0.1%
Tamil 555
 
< 0.1%
Other values (6) 1108
 
0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
' 426400
20.4%
172982
 
8.3%
n 120605
 
5.8%
_ 106600
 
5.1%
: 106600
 
5.1%
s 99222
 
4.8%
i 94120
 
4.5%
e 92748
 
4.4%
a 75235
 
3.6%
, 64969
 
3.1%
Other values (52) 727067
34.8%
None
ValueCountFrequency (%)
ç 4443
42.7%
ñ 2413
23.2%
ê 591
 
5.7%
λ 426
 
4.1%
ý 284
 
2.7%
Č 284
 
2.7%
ü 247
 
2.4%
ν 213
 
2.0%
ά 213
 
2.0%
κ 213
 
2.0%
Other values (10) 1085
 
10.4%
Cyrillic
ValueCountFrequency (%)
с 3213
30.7%
к 1735
16.6%
и 1680
16.1%
й 1616
15.4%
у 1565
15.0%
а 113
 
1.1%
р 87
 
0.8%
У 53
 
0.5%
ї 53
 
0.5%
н 53
 
0.5%
Other values (12) 292
 
2.8%
CJK
ValueCountFrequency (%)
1758
16.8%
1758
16.8%
1758
16.8%
1263
12.0%
946
9.0%
790
7.5%
790
7.5%
473
 
4.5%
广 473
 
4.5%
473
 
4.5%
Devanagari
ValueCountFrequency (%)
707
16.7%
707
16.7%
707
16.7%
707
16.7%
707
16.7%
ि 707
16.7%
Hangul
ValueCountFrequency (%)
542
16.7%
542
16.7%
542
16.7%
542
16.7%
542
16.7%
542
16.7%
Arabic
ValueCountFrequency (%)
ا 538
16.1%
ر 538
16.1%
ي 341
10.2%
ب 341
10.2%
ع 341
10.2%
ل 341
10.2%
ة 341
10.2%
ف 142
 
4.2%
س 142
 
4.2%
ی 142
 
4.2%
Other values (5) 142
 
4.2%
Hebrew
ValueCountFrequency (%)
ִ 430
25.0%
ת 215
12.5%
ע 215
12.5%
ר 215
12.5%
י 215
12.5%
ְ 215
12.5%
ב 215
12.5%
Thai
ValueCountFrequency (%)
352
28.6%
176
14.3%
176
14.3%
176
14.3%
176
14.3%
176
14.3%
Telugu
ValueCountFrequency (%)
136
33.3%
68
16.7%
68
16.7%
68
16.7%
68
16.7%
Tamil
ValueCountFrequency (%)
111
20.0%
111
20.0%
111
20.0%
ி 111
20.0%
111
20.0%
Bengali
ValueCountFrequency (%)
94
40.0%
47
20.0%
47
20.0%
47
20.0%
Latin Ext Additional
ValueCountFrequency (%)
ế 61
50.0%
61
50.0%
Georgian
ValueCountFrequency (%)
33
14.3%
33
14.3%
33
14.3%
33
14.3%
33
14.3%
33
14.3%
33
14.3%
Gurmukhi
ValueCountFrequency (%)
18
16.7%
18
16.7%
18
16.7%
18
16.7%
18
16.7%
18
16.7%
IPA Ext
ValueCountFrequency (%)
ə 4
100.0%

status
Text

Distinct6
Distinct (%)< 0.1%
Missing87
Missing (%)0.2%
Memory size2.8 MiB
2024-04-26T20:17:11.341947image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length15
Median length8
Mean length8.011921814
Min length7

Characters and Unicode

Total characters363573
Distinct characters18
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowReleased
2nd rowReleased
3rd rowReleased
4th rowReleased
5th rowReleased
ValueCountFrequency (%)
released 45014
98.9%
rumored 230
 
0.5%
production 118
 
0.3%
post 98
 
0.2%
in 20
 
< 0.1%
planned 15
 
< 0.1%
canceled 2
 
< 0.1%
2024-04-26T20:17:11.594947image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 135291
37.2%
d 45379
 
12.5%
R 45244
 
12.4%
s 45112
 
12.4%
l 45031
 
12.4%
a 45031
 
12.4%
o 564
 
0.2%
r 348
 
0.1%
u 348
 
0.1%
P 231
 
0.1%
Other values (8) 994
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 317958
87.5%
Uppercase Letter 45497
 
12.5%
Space Separator 118
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 135291
42.5%
d 45379
 
14.3%
s 45112
 
14.2%
l 45031
 
14.2%
a 45031
 
14.2%
o 564
 
0.2%
r 348
 
0.1%
u 348
 
0.1%
m 230
 
0.1%
t 216
 
0.1%
Other values (3) 408
 
0.1%
Uppercase Letter
ValueCountFrequency (%)
R 45244
99.4%
P 231
 
0.5%
I 20
 
< 0.1%
C 2
 
< 0.1%
Space Separator
ValueCountFrequency (%)
118
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 363455
> 99.9%
Common 118
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 135291
37.2%
d 45379
 
12.5%
R 45244
 
12.4%
s 45112
 
12.4%
l 45031
 
12.4%
a 45031
 
12.4%
o 564
 
0.2%
r 348
 
0.1%
u 348
 
0.1%
P 231
 
0.1%
Other values (7) 876
 
0.2%
Common
ValueCountFrequency (%)
118
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 363573
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 135291
37.2%
d 45379
 
12.5%
R 45244
 
12.4%
s 45112
 
12.4%
l 45031
 
12.4%
a 45031
 
12.4%
o 564
 
0.2%
r 348
 
0.1%
u 348
 
0.1%
P 231
 
0.1%
Other values (8) 994
 
0.3%

tagline
Text

MISSING 

Distinct20283
Distinct (%)99.4%
Missing25054
Missing (%)55.1%
Memory size2.8 MiB
2024-04-26T20:17:11.955153image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length297
Median length204
Mean length47.00284147
Min length1

Characters and Unicode

Total characters959422
Distinct characters170
Distinct categories17 ?
Distinct scripts6 ?
Distinct blocks10 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique20177 ?
Unique (%)98.8%

Sample

1st rowRoll the dice and unleash the excitement!
2nd rowStill Yelling. Still Fighting. Still Ready for Love.
3rd rowFriends are the people who let you be yourself... and never let you forget it.
4th rowJust When His World Is Back To Normal... He's In For The Surprise Of His Life!
5th rowA Los Angeles Crime Saga
ValueCountFrequency (%)
the 11004
 
6.3%
a 6820
 
3.9%
of 4406
 
2.5%
to 3586
 
2.1%
is 2800
 
1.6%
in 2693
 
1.5%
and 2686
 
1.5%
you 2389
 
1.4%
1585
 
0.9%
for 1524
 
0.9%
Other values (15108) 134566
77.3%
2024-04-26T20:17:12.485407image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
153795
16.0%
e 94486
 
9.8%
t 57309
 
6.0%
o 56611
 
5.9%
a 51521
 
5.4%
n 47539
 
5.0%
i 46086
 
4.8%
r 45029
 
4.7%
s 42399
 
4.4%
h 37192
 
3.9%
Other values (160) 327455
34.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 681040
71.0%
Space Separator 153795
 
16.0%
Uppercase Letter 75028
 
7.8%
Other Punctuation 44604
 
4.6%
Decimal Number 2687
 
0.3%
Dash Punctuation 1948
 
0.2%
Final Punctuation 98
 
< 0.1%
Open Punctuation 56
 
< 0.1%
Close Punctuation 55
 
< 0.1%
Currency Symbol 37
 
< 0.1%
Other values (7) 74
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 94486
13.9%
t 57309
 
8.4%
o 56611
 
8.3%
a 51521
 
7.6%
n 47539
 
7.0%
i 46086
 
6.8%
r 45029
 
6.6%
s 42399
 
6.2%
h 37192
 
5.5%
l 30199
 
4.4%
Other values (43) 172669
25.4%
Other Letter
ValueCountFrequency (%)
1
 
2.9%
1
 
2.9%
1
 
2.9%
1
 
2.9%
1
 
2.9%
1
 
2.9%
1
 
2.9%
1
 
2.9%
1
 
2.9%
1
 
2.9%
Other values (24) 24
70.6%
Uppercase Letter
ValueCountFrequency (%)
T 10013
 
13.3%
A 6878
 
9.2%
S 5653
 
7.5%
H 4404
 
5.9%
I 4387
 
5.8%
E 4307
 
5.7%
W 3683
 
4.9%
O 3479
 
4.6%
L 3196
 
4.3%
N 3196
 
4.3%
Other values (20) 25832
34.4%
Other Punctuation
ValueCountFrequency (%)
. 26655
59.8%
! 5785
 
13.0%
' 5676
 
12.7%
, 4231
 
9.5%
? 1161
 
2.6%
" 582
 
1.3%
148
 
0.3%
: 138
 
0.3%
& 84
 
0.2%
* 42
 
0.1%
Other values (7) 102
 
0.2%
Decimal Number
ValueCountFrequency (%)
0 802
29.8%
1 516
19.2%
2 299
 
11.1%
3 208
 
7.7%
9 208
 
7.7%
5 168
 
6.3%
4 140
 
5.2%
7 121
 
4.5%
6 121
 
4.5%
8 104
 
3.9%
Math Symbol
ValueCountFrequency (%)
= 5
35.7%
+ 5
35.7%
| 2
 
14.3%
~ 1
 
7.1%
1
 
7.1%
Dash Punctuation
ValueCountFrequency (%)
- 1931
99.1%
9
 
0.5%
8
 
0.4%
Final Punctuation
ValueCountFrequency (%)
82
83.7%
15
 
15.3%
» 1
 
1.0%
Initial Punctuation
ValueCountFrequency (%)
14
73.7%
4
 
21.1%
« 1
 
5.3%
Open Punctuation
ValueCountFrequency (%)
( 49
87.5%
[ 7
 
12.5%
Close Punctuation
ValueCountFrequency (%)
) 48
87.3%
] 7
 
12.7%
Other Number
ValueCountFrequency (%)
½ 2
66.7%
² 1
33.3%
Modifier Letter
ValueCountFrequency (%)
ˌ 1
50.0%
ˈ 1
50.0%
Space Separator
ValueCountFrequency (%)
153795
100.0%
Currency Symbol
ValueCountFrequency (%)
$ 37
100.0%
Nonspacing Mark
ValueCountFrequency (%)
1
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 756068
78.8%
Common 203319
 
21.2%
Han 21
 
< 0.1%
Tamil 5
 
< 0.1%
Hiragana 5
 
< 0.1%
Katakana 4
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 94486
 
12.5%
t 57309
 
7.6%
o 56611
 
7.5%
a 51521
 
6.8%
n 47539
 
6.3%
i 46086
 
6.1%
r 45029
 
6.0%
s 42399
 
5.6%
h 37192
 
4.9%
l 30199
 
4.0%
Other values (73) 247697
32.8%
Common
ValueCountFrequency (%)
153795
75.6%
. 26655
 
13.1%
! 5785
 
2.8%
' 5676
 
2.8%
, 4231
 
2.1%
- 1931
 
0.9%
? 1161
 
0.6%
0 802
 
0.4%
" 582
 
0.3%
1 516
 
0.3%
Other values (42) 2185
 
1.1%
Han
ValueCountFrequency (%)
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
Other values (11) 11
52.4%
Tamil
ValueCountFrequency (%)
1
20.0%
1
20.0%
1
20.0%
1
20.0%
1
20.0%
Hiragana
ValueCountFrequency (%)
1
20.0%
1
20.0%
1
20.0%
1
20.0%
1
20.0%
Katakana
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 958992
> 99.9%
Punctuation 280
 
< 0.1%
None 110
 
< 0.1%
CJK 21
 
< 0.1%
Tamil 5
 
< 0.1%
Hiragana 5
 
< 0.1%
Katakana 4
 
< 0.1%
IPA Ext 2
 
< 0.1%
Modifier Letters 2
 
< 0.1%
Math Operators 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
153795
16.0%
e 94486
 
9.9%
t 57309
 
6.0%
o 56611
 
5.9%
a 51521
 
5.4%
n 47539
 
5.0%
i 46086
 
4.8%
r 45029
 
4.7%
s 42399
 
4.4%
h 37192
 
3.9%
Other values (78) 327025
34.1%
Punctuation
ValueCountFrequency (%)
148
52.9%
82
29.3%
15
 
5.4%
14
 
5.0%
9
 
3.2%
8
 
2.9%
4
 
1.4%
None
ValueCountFrequency (%)
é 18
16.4%
ä 16
14.5%
ö 8
 
7.3%
á 6
 
5.5%
ó 6
 
5.5%
í 5
 
4.5%
ü 5
 
4.5%
ı 5
 
4.5%
· 4
 
3.6%
ñ 3
 
2.7%
Other values (26) 34
30.9%
IPA Ext
ValueCountFrequency (%)
ə 2
100.0%
CJK
ValueCountFrequency (%)
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
Other values (11) 11
52.4%
Tamil
ValueCountFrequency (%)
1
20.0%
1
20.0%
1
20.0%
1
20.0%
1
20.0%
Modifier Letters
ValueCountFrequency (%)
ˌ 1
50.0%
ˈ 1
50.0%
Katakana
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%
Hiragana
ValueCountFrequency (%)
1
20.0%
1
20.0%
1
20.0%
1
20.0%
1
20.0%
Math Operators
ValueCountFrequency (%)
1
100.0%

title
Text

Distinct42277
Distinct (%)93.0%
Missing6
Missing (%)< 0.1%
Memory size3.2 MiB
2024-04-26T20:17:12.781922image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length105
Median length79
Mean length16.70853498
Min length1

Characters and Unicode

Total characters759570
Distinct characters287
Distinct categories17 ?
Distinct scripts7 ?
Distinct blocks12 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique39947 ?
Unique (%)87.9%

Sample

1st rowToy Story
2nd rowJumanji
3rd rowGrumpier Old Men
4th rowWaiting to Exhale
5th rowFather of the Bride Part II
ValueCountFrequency (%)
the 14571
 
10.7%
of 4938
 
3.6%
a 2244
 
1.6%
in 1697
 
1.2%
and 1634
 
1.2%
to 1055
 
0.8%
763
 
0.6%
man 665
 
0.5%
love 664
 
0.5%
for 602
 
0.4%
Other values (24431) 107634
78.9%
2024-04-26T20:17:13.219462image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
91029
 
12.0%
e 76408
 
10.1%
a 49056
 
6.5%
o 45765
 
6.0%
n 40931
 
5.4%
r 40096
 
5.3%
i 39859
 
5.2%
t 36792
 
4.8%
s 29591
 
3.9%
h 28564
 
3.8%
Other values (277) 281479
37.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 535372
70.5%
Uppercase Letter 117493
 
15.5%
Space Separator 91029
 
12.0%
Other Punctuation 10513
 
1.4%
Decimal Number 3863
 
0.5%
Dash Punctuation 986
 
0.1%
Close Punctuation 87
 
< 0.1%
Open Punctuation 85
 
< 0.1%
Final Punctuation 38
 
< 0.1%
Other Letter 25
 
< 0.1%
Other values (7) 79
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 76408
14.3%
a 49056
9.2%
o 45765
 
8.5%
n 40931
 
7.6%
r 40096
 
7.5%
i 39859
 
7.4%
t 36792
 
6.9%
s 29591
 
5.5%
h 28564
 
5.3%
l 25992
 
4.9%
Other values (121) 122318
22.8%
Uppercase Letter
ValueCountFrequency (%)
T 16037
13.6%
S 10354
 
8.8%
M 8042
 
6.8%
B 7674
 
6.5%
C 7175
 
6.1%
A 6808
 
5.8%
D 6355
 
5.4%
L 5883
 
5.0%
H 5183
 
4.4%
W 5175
 
4.4%
Other values (65) 38807
33.0%
Other Letter
ValueCountFrequency (%)
ی 2
 
8.0%
ک 2
 
8.0%
چ 2
 
8.0%
ه 2
 
8.0%
1
 
4.0%
1
 
4.0%
1
 
4.0%
1
 
4.0%
ª 1
 
4.0%
ا 1
 
4.0%
Other values (11) 11
44.0%
Other Punctuation
ValueCountFrequency (%)
: 3727
35.5%
' 2512
23.9%
. 1604
15.3%
, 1136
 
10.8%
! 648
 
6.2%
& 460
 
4.4%
? 269
 
2.6%
/ 80
 
0.8%
* 19
 
0.2%
# 13
 
0.1%
Other values (8) 45
 
0.4%
Decimal Number
ValueCountFrequency (%)
2 864
22.4%
1 699
18.1%
0 619
16.0%
3 484
12.5%
9 230
 
6.0%
4 229
 
5.9%
5 225
 
5.8%
7 196
 
5.1%
8 161
 
4.2%
6 156
 
4.0%
Math Symbol
ValueCountFrequency (%)
+ 17
70.8%
× 3
 
12.5%
= 1
 
4.2%
1
 
4.2%
1
 
4.2%
1
 
4.2%
Other Number
ValueCountFrequency (%)
½ 12
63.2%
² 3
 
15.8%
³ 2
 
10.5%
1
 
5.3%
1
 
5.3%
Other Symbol
ValueCountFrequency (%)
° 3
37.5%
2
25.0%
1
 
12.5%
1
 
12.5%
1
 
12.5%
Currency Symbol
ValueCountFrequency (%)
$ 18
85.7%
¢ 2
 
9.5%
£ 1
 
4.8%
Dash Punctuation
ValueCountFrequency (%)
- 971
98.5%
15
 
1.5%
Close Punctuation
ValueCountFrequency (%)
) 82
94.3%
] 5
 
5.7%
Open Punctuation
ValueCountFrequency (%)
( 80
94.1%
[ 5
 
5.9%
Final Punctuation
ValueCountFrequency (%)
37
97.4%
1
 
2.6%
Initial Punctuation
ValueCountFrequency (%)
1
50.0%
1
50.0%
Space Separator
ValueCountFrequency (%)
91029
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 3
100.0%
Format
ValueCountFrequency (%)
2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 652335
85.9%
Common 106680
 
14.0%
Cyrillic 361
 
< 0.1%
Greek 170
 
< 0.1%
Arabic 11
 
< 0.1%
Katakana 8
 
< 0.1%
Han 5
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 76408
 
11.7%
a 49056
 
7.5%
o 45765
 
7.0%
n 40931
 
6.3%
r 40096
 
6.1%
i 39859
 
6.1%
t 36792
 
5.6%
s 29591
 
4.5%
h 28564
 
4.4%
l 25992
 
4.0%
Other values (107) 239281
36.7%
Common
ValueCountFrequency (%)
91029
85.3%
: 3727
 
3.5%
' 2512
 
2.4%
. 1604
 
1.5%
, 1136
 
1.1%
- 971
 
0.9%
2 864
 
0.8%
1 699
 
0.7%
! 648
 
0.6%
0 619
 
0.6%
Other values (50) 2871
 
2.7%
Cyrillic
ValueCountFrequency (%)
е 33
 
9.1%
о 32
 
8.9%
а 32
 
8.9%
н 26
 
7.2%
и 24
 
6.6%
р 23
 
6.4%
к 17
 
4.7%
в 16
 
4.4%
с 15
 
4.2%
л 14
 
3.9%
Other values (38) 129
35.7%
Greek
ValueCountFrequency (%)
α 20
 
11.8%
ι 14
 
8.2%
ο 14
 
8.2%
τ 9
 
5.3%
λ 8
 
4.7%
ρ 8
 
4.7%
ά 8
 
4.7%
ν 7
 
4.1%
ε 6
 
3.5%
π 6
 
3.5%
Other values (32) 70
41.2%
Katakana
ValueCountFrequency (%)
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
Arabic
ValueCountFrequency (%)
ی 2
18.2%
ک 2
18.2%
چ 2
18.2%
ه 2
18.2%
ا 1
9.1%
س 1
9.1%
ج 1
9.1%
Han
ValueCountFrequency (%)
1
20.0%
1
20.0%
1
20.0%
1
20.0%
1
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 757982
99.8%
None 1132
 
0.1%
Cyrillic 361
 
< 0.1%
Punctuation 62
 
< 0.1%
Arabic 11
 
< 0.1%
Katakana 8
 
< 0.1%
CJK 5
 
< 0.1%
Misc Symbols 3
 
< 0.1%
Letterlike Symbols 2
 
< 0.1%
Math Operators 2
 
< 0.1%
Other values (2) 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
91029
 
12.0%
e 76408
 
10.1%
a 49056
 
6.5%
o 45765
 
6.0%
n 40931
 
5.4%
r 40096
 
5.3%
i 39859
 
5.3%
t 36792
 
4.9%
s 29591
 
3.9%
h 28564
 
3.8%
Other values (76) 279891
36.9%
None
ValueCountFrequency (%)
é 218
19.3%
ä 128
 
11.3%
ö 56
 
4.9%
è 54
 
4.8%
ô 44
 
3.9%
ü 39
 
3.4%
ó 37
 
3.3%
á 35
 
3.1%
ı 35
 
3.1%
à 33
 
2.9%
Other values (108) 453
40.0%
Punctuation
ValueCountFrequency (%)
37
59.7%
15
24.2%
5
 
8.1%
2
 
3.2%
1
 
1.6%
1
 
1.6%
1
 
1.6%
Cyrillic
ValueCountFrequency (%)
е 33
 
9.1%
о 32
 
8.9%
а 32
 
8.9%
н 26
 
7.2%
и 24
 
6.6%
р 23
 
6.4%
к 17
 
4.7%
в 16
 
4.4%
с 15
 
4.2%
л 14
 
3.9%
Other values (38) 129
35.7%
Arabic
ValueCountFrequency (%)
ی 2
18.2%
ک 2
18.2%
چ 2
18.2%
ه 2
18.2%
ا 1
9.1%
س 1
9.1%
ج 1
9.1%
Misc Symbols
ValueCountFrequency (%)
2
66.7%
1
33.3%
CJK
ValueCountFrequency (%)
1
20.0%
1
20.0%
1
20.0%
1
20.0%
1
20.0%
Number Forms
ValueCountFrequency (%)
1
100.0%
Letterlike Symbols
ValueCountFrequency (%)
1
50.0%
1
50.0%
Katakana
ValueCountFrequency (%)
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
Math Operators
ValueCountFrequency (%)
1
50.0%
1
50.0%
Arrows
ValueCountFrequency (%)
1
100.0%

video
Boolean

IMBALANCE 

Distinct2
Distinct (%)< 0.1%
Missing6
Missing (%)< 0.1%
Memory size1.6 MiB
False
45367 
True
 
93
(Missing)
 
6
ValueCountFrequency (%)
False 45367
99.8%
True 93
 
0.2%
(Missing) 6
 
< 0.1%
2024-04-26T20:17:13.331461image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

vote_average
Real number (ℝ)

ZEROS 

Distinct92
Distinct (%)0.2%
Missing6
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean5.618207215
Minimum0
Maximum10
Zeros2998
Zeros (%)6.6%
Negative0
Negative (%)0.0%
Memory size355.3 KiB
2024-04-26T20:17:13.433461image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q15
median6
Q36.8
95-th percentile7.8
Maximum10
Range10
Interquartile range (IQR)1.8

Descriptive statistics

Standard deviation1.924215992
Coefficient of variation (CV)0.3424964438
Kurtosis2.500402198
Mean5.618207215
Median Absolute Deviation (MAD)0.9
Skewness-1.518990058
Sum255403.7
Variance3.702607182
MonotonicityNot monotonic
2024-04-26T20:17:13.559462image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2998
 
6.6%
6 2468
 
5.4%
5 2001
 
4.4%
7 1886
 
4.1%
6.5 1722
 
3.8%
6.3 1603
 
3.5%
5.5 1381
 
3.0%
5.8 1369
 
3.0%
6.4 1350
 
3.0%
6.7 1342
 
3.0%
Other values (82) 27340
60.1%
ValueCountFrequency (%)
0 2998
6.6%
0.5 13
 
< 0.1%
0.7 1
 
< 0.1%
1 105
 
0.2%
1.1 1
 
< 0.1%
ValueCountFrequency (%)
10 190
0.4%
9.8 1
 
< 0.1%
9.6 1
 
< 0.1%
9.5 18
 
< 0.1%
9.4 3
 
< 0.1%

vote_count
Real number (ℝ)

ZEROS 

Distinct1820
Distinct (%)4.0%
Missing6
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean109.8973383
Minimum0
Maximum14075
Zeros2899
Zeros (%)6.4%
Negative0
Negative (%)0.0%
Memory size355.3 KiB
2024-04-26T20:17:13.681981image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q13
median10
Q334
95-th percentile434
Maximum14075
Range14075
Interquartile range (IQR)31

Descriptive statistics

Standard deviation491.3103739
Coefficient of variation (CV)4.470630331
Kurtosis151.2028027
Mean109.8973383
Median Absolute Deviation (MAD)8
Skewness10.45023206
Sum4995933
Variance241385.8835
MonotonicityNot monotonic
2024-04-26T20:17:13.823983image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 3264
 
7.2%
2 3132
 
6.9%
0 2899
 
6.4%
3 2787
 
6.1%
4 2480
 
5.5%
5 2097
 
4.6%
6 1747
 
3.8%
7 1570
 
3.5%
8 1359
 
3.0%
9 1194
 
2.6%
Other values (1810) 22931
50.4%
ValueCountFrequency (%)
0 2899
6.4%
1 3264
7.2%
2 3132
6.9%
3 2787
6.1%
4 2480
5.5%
ValueCountFrequency (%)
14075 1
< 0.1%
12269 1
< 0.1%
12114 1
< 0.1%
12000 1
< 0.1%
11444 1
< 0.1%